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(54) Title: METHODS AND COMPOSITIONS FOR POLYPEPTIDE ENGINEERING 
(57) Abstract 



Methods are provided for the evolution of proteins of industrial and pharmaceutical interest, including methods for effecting 
recombination and selection. Compositions produced by these methods are also disclosed. 
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METHODS AND COMPOSITIONS 
FOR POLYPEPTIDE ENGINEERING 
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Background of the Invention 

5 Recursive sequence recombination entails performing iterative cycles 

of recombination and screening or selection to "evolve" individual genes, 
whole plasmids or viruses, multigene clusters, or even whole genomes 
(Stemmer, Bio/Technology 13:549-553 (1995)). Such techniques do not 
require the extensive analysis and computation required by conventional 

10 methods for polypeptide engineering. Recursive sequence recombination 
allows the recombination of large numbers of mutations in a minimum 
number of selection cycles, in contrast to traditional, pairwise recombination 
events. 

Thus, recursive sequence recombination (RSR) techniques provide 
15 particular advantages in that they provide recombination between mutations 
in any or all of these, thereby providing a very fast way of exploring the 
manner in which different combinations of mutations can affect a desired 
result 

In some instances, however, structural and/or functional information is 
20 available which, although not required for recursive sequence recombination, 
provides opportunities for modifications of the technique. In other instances, 
selection and/or screening of a large number of recombinants can be costly or 
time-consuming. A further problem can be the manipulation of large nucleic 
acid molecules. The instant invention addresses these issues and others. 

25 

Summary of the Invention 
In one aspect, the present invention provides a method for producing a 
recombinant DNA encoding a protein, the method comprising: 

(a) digesting at least a first and second DNA substrate molecule, 
30 wherein the at least first and second substrate molecules are homologous and 
differ from each other in at least one nucleotide, with a restriction 
endonuclease, wherein the at least first and second DNA substrate molecules 
each encode a protein, or are homologous to a protein-encoding DNA 
substrate molecule; 

35 (b) ligating the resulting mixture of DNA fragments to generate a library 

of recombinant DNA molecules, which library comprises a plurality of DNA 
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molecules, each comprising a subsequence from the first nucleic acid and a 
subsequence from the second nucleic acid, wherein the plurality of DNA 
molecules are homologous; 

(c) screening or selecting the resulting products of (b) for a desired 
5 property, 

(d) recovering a recombinant DNA molecule encoding an evolved 
protein; and, 

(e) repeating steps a-d using the recombinant DNA molecule of step (d) 
as the first or second DNA substrate molecule of step (a), whereby a 

10 recombinant DNA encoding a protein is produced. 

In another aspect, the present invention provides a method for evolving 
a protein encoded by a recombinant DNA substrate molecule by recombining 
at least a first and second DNA substrate molecule, the method comprising: 
(a) providing at least first and second substrate molecules which differ 

15 from each other in at least one nucleotide and which comprise defined 

segments, the first and second substrate molecule each encoding a protein, or 
being homologous to a protein-coding DNA, and providing a set of 
oligonucleotide FCR primers, the set of PCR primers comprising a plurality 
of primers, each of the plurality of PCR primers comprising a first 

20 subsequence which is complementary to a first segment from the first 

substrate molecule and a second.subsequence which is complementary to a 
second segment from the second substrate nucleic acid, wherein the first 
segment from the first substrate molecule comprises at least one nucleotide 
difference as compared to the second segment; 

25 (b) amplifying the segments of the at least a first and second DNA 

substrate molecules with the primers of step (a) in a polymerase chain 
reaction; 

(c) assembling the products of step (b) to generate a library of 
recombinant DNA substrate molecules; 
30 (d) screening or selecting the products of (c) for a desired property; and 

(e) recovering a recombinant DNA substrate molecule from (d) thereby 
providing a recombinant DNA substrate molecule encoding an evolved 
protein; and, 

(f) expressing the evolved protein, thereby producing the evolved 
rotein. 




3 



• •• • 
• ••• 



In another aspect, the present invention provides a method for evolving 
a protein encoded by a recombinant DNA substrate molecule, by recombining 
at least a first and second DNA substrate molecule, the method comprising: 

(a) providing at least first and second substrate molecules, which first 
5 and second substrate molecules each encode a protein, or are homologous to 

a protein-coding DNA substrate molecule, which first and second substrate 
molecules share a region of sequence homology of about 10 to 100 base pairs 
and comprise defined segments and providing regions of homology in the at 
least a first and second DNA substrate molecules by inserting an intron 
10 sequence between at least two defined segments; 

(b) fragmenting and recombining DNA substrate molecules of (a), 
wherein regions of homology are provided by the introns; 

(c) screening or selecting the products of (b) for a desired property; and 

(d) recovering the recombinant DNA substrate molecule from the 
15 products of (c) thereby providing a recombinant DNA substrate molecule 

encoding an evolved protein; 

(e) expressing the evolved protein, thereby producing the evolved 

protein; and 

(f) expressing the evolved protein, thereby producing the evolved 
20 protein. 

In another aspect, the present invention provides a method for evolving 
a protein encoded by a DNA substrate molecule the method comprising: 

(a) providing a set of oligonucleotide PCR primers, for amplification 
and recombination of at least a first and second DNA substrate molecule, 
wherein the at least a first and second substrate molecules differ from each 
other in at least one nucleotide and comprise defined segments and wherein 
for each junction of segments a pair of primers is provided, one member of 
each pair bridging the junction at one end of a segment and the other bridging 
the junction at the other end of the segment, with the terminal ends of the 

30 DNA molecule having as one member of the pair a generic primer, and 

wherein a set of primers is provided for each of the at least a first and second 
substrate molecules; 

(b) amplifying the segments of the at least a first and second DNA 
substrate molecules with the primers of (a) in a polymerase chain reaction; 

(c) assembling the products of (b) to generate a pool of recombinant 
ilecules; 
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(d) selecting or screening the products of (c) for a desired property; 

and 

(e) recovering a recombinant DNA substrate molecule from the 
5 products of (d) encoding an evolved protein. 

In another aspect, the present invention provides a method for 
recombining at least a first and second DNA substrate molecule, comprising: 

(a) transfecting a host cell with at least a first and second DNA 
substrate molecule wherein the at least a first and second DNA substrate 

10 molecules are recombined in the host cell; 

(b) screening or selecting the products of (a) for a desired property; 

(c) recovering recombinant DNA substrate molecules from (b); and 

(d) repeating steps (a) - (c). 

In another aspect, the present invention provides a method of 
15 performing oligonucleotide mediated recombination, the method comprising: 

providing a first and a second nucleic acid; 

selecting segments in the first and second nucleic acid; 

providing a plurality of bridge oligonucleotides, which bridge 
oligonucleotides each comprise at least a first subsequence which is 
20 complementary to at least one segment in the first nucleic acid and at least a 
second subsequence which is complementary to the second nucleic acid; 

extending the plurality of bridge oligonucleotides with a polymerase, 
using the first and second nucleic acids, or subsequences of the first and 
second nucleic acids, as templates, thereby producing a plurality of 
25 recombinant nucleic acid segments; and, 

providing a plurality of recombinant nucleic acids, each comprising 
one or more subsequence comprising one or more of the recombinant nucleic 
acid segments. 

In another aspect, the present invention provides a method for making 
30 a modified or recombinant nucleic acid, the method comprising: 

(a) providing a single-stranded template nucleic acid; 

(b) providing a population of nucleic acid fragments, the nucleic acid 
fragments being produced by fragmentation of an at least first 
nucleic acid substrate molecule or at least first nucleic acid 
substrate molecules, said at least first nucleic acid substrate 
molecule or molecules being homologous to the template nucleic 
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acid, and differing from the template nucleic acid in at least one nucleotide; 

(c) contacting the single-stranded template nucleic acid with the 
population of nucleic acid fragments, thereby producing at least 
one annealed nucleic acid product; and, 

(d) contacting the products of (c) with a ligase, thereby producing a 
modified or recombinant nucleic acid. 

In another aspect, the present invention provides a method for making 
a modified or recombinant nucleic acid, the method comprising: 

(a) providing a selected single-stranded template nucleic acid; 

(b) contacting the selected single-stranded template nucleic acid with a 

population of nucleic acids or nucleic acid fragments, wherein the 
population of nucleic acids or nucleic acid fragments comprises 
one or more of: 

(i) nucleic acids or nucleic acid fragments which comprise nucleic 
acid sequences which are homologous to the single-stranded 
template nucleic acid; 

(ii) nucleic acids or nucleic acid fragments resulting from digestion of 
at least first substrate molecules with a DNase, 

(iii) nucleic acids or nucleic acid fragments which comprise nucleic 
acid sequences produced by mutagenesis of a parental nucleic 
acid, 

(iv) nucleic acids or nucleic acid fragments resulting from digestion of 
at least first substrate molecules with a restriction enzyme, 

(v) nucleic acids or nucleic acid fragments comprising at least one 
nucleic acid sequence which is homologous to the single-stranded 
template nucleic acid, which sequence is present in the 
population at a concentration of less than 1% by weight of the 
total population of nucleic acids or nucleic acid fragments, 

(vi) nucleic acids or nucleic acid fragments comprising at least one 
hundred nucleic acid sequences which are homologous to the 
template, or 

(vii) nucleic acids or nucleic acid fragments comprising sequences of 
at least 50 nucleotides, 

thereby producing an annealed nucleic acid product; and 

(c) contacting the annealed nucleic acid with a polymerase or a ligase, 
thereby producing a modified or recombinant nucleic acid strand. 
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In another aspect, the present invention provides a method for 
optimizing expression of a protein by evolving the protein, wherein the 
protein is encoded by a DNA substrate molecule, comprising: 

(a) providing a set of oligonucleotides, wherein each oligonucleotide 
comprises at least two regions complementary to the DNA molecule and at 
least one degenerate region, each degenerate region encoding a region of an 
amino acid sequence of the protein; 

(b) assembling the set of oligonucleotides into a library of full length 

genes; 

(c) expressing the products of (b) in a host cell; 

(d) screening the products of (c) for improved expression of the 

protein; 

(e) recovering a recombinant DNA substrate molecule encoding an 
evolved protein form (d), and 

(f) expressing the evolved protein, thereby producing the evolved 

protein. 

In another aspect, the present invention provides a method for 
optimizing secretion of a protein in a host by evolving a gene encoding a 
secretory function, comprising: 

(a) providing a cluster of genes encoding secretory functions; 

(b) recombining at least a first and second sequence in the gene 
cluster of (a) encoding a secretory function, the at least a first and second 
sequences differing from each other in at least one nucleotide, to generate a 
library of recombinant sequences; 

(c) transforming a host cell culture with the products of (b), wherein 
the host cell comprises a DNA sequence encoding the protein; 

(d) subjecting the product of (c) to screening or selection for secretion 
of the protein; and 

(e) recovering DNA encoding an evolved protein comprising a 
secretory function from the product of (d). 

A further aspect of the invention is a method of enriching a population 
of DNA fragments for mutant sequences comprising: 

(a) denaturing and renaturing the population of fragments to generate 
a population of hybrid double-stranded fragments in which at least one 
double-stranded fragment comprises at least one base pair mismatch; 

(b) fragmenting the products of (a) into fragments of about 20-100 bp; 
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(c) affinity-purifying fragments having a mismatch on an affinity 
matrix to generate a pool of DNA fragments enriched for mutant sequences; 
and 

(d) assembling the products of (c) to generate a library of recombinant 
DNA substrate molecules. 
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A further aspect of the invention is a method for optimizing expression of a 
protein encoded by a DMA substrate molecule by evolving the protein, wherein the DMA 
substrate molecule comprises at least one lac operator and a fusion of a DNA sequence 
encoding the protein with a DNA sequence encoding a lac headpiece dimer, the method 
comprising: 

(a) transforming a host cell with a library of mutagenized DNA substrate 

molecules; 

(b) inducing expression of the protein encoded by the library of (a); 

(c) preparing an extract of the product of (b); 

(d) fractionating insoluble protein from complexes of soluble protein and DNA; 

and 

(e) recovering a DNA substrate molecule encoding an evolved protein from 

(d). 

A further aspect of the invention is a method for evolving functional 
expression of a protein encoded by a DNA substrate molecule comprising a fusion of a DNA 
sequence encoding the protein with a DNA sequence encoding filamentous phage protein to 
generate a fusion protein, the method comprising: 

(a) providing a host cell producing infectious particles expressing a fusion 
protein encoded by a library of mutagenized DNA substrate molecules; 

(b) recovering from (a) infectious particles displaying the fusion protein; 

(c) affinity purifying particles displaying the mutant protein using a ligand for 
the protein; and 

(d) recovering a DNA substrate molecule encoding an evolved protein from 
affinity purified particles of (c). 

A further aspect of the invention is a method for optimizing expression of a 
protein encoded by a DNA substrate molecule comprising a fusion of a DNA sequence 
encoding the protein with a lac headpiece dimer, wherein the DNA substrate molecule is 
present on a first plasmid vector, the method comprising: 

(a) providing a host cell transformed with the first vector and a second vector 
comprising a library of mutants of at least one chaperonin gene, and at least one lac 
operator, 

(b) preparing an extract of the product of (a); 

(c) fractionating insoluble protein from complexes of soluble protein and DNA; 

and 

(d) recovering DNA encoding a chaperonin gene from (c). 

A further aspect of the invention is a method for optimizing expression of a 
protein encoded by a DNA substrate molecule comprising a fusion of a DNA sequence 
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encoding the protein with a filamentous phage gene, wherein the fusion is 
carried on a phagemid comprising a library of chaperonin gene mutants, the 

method comprising: 

(a) providing a host cell producing infectious particles expressing a 
5 fusion encoded by a library of mutagenized DNA substrate molecules; 

(b) recovering from (a) infectious particles displaying the fusion 
protein; 

(c) affinity purifying particles displaying the protein using a ligand 
for the protein; and 

10 (d) ■ recovering DNA encoding the mutant chaperonin from affinity 

purified particles of (c). 

A further aspect of the invention is a method for evolving an improved 

DNA polymerase comprising: 

(a) providing a library of mutant DNA substrate molecules encoding 

15 mutant DNA polymerase; 

(b) screening extracts of cells transfected with (a) and comparing 
activity with wild type DNA polymerase; 

(c) recovering mutant DNA substrate molecules from cells in (b) 
expressing mutant DNA polymerase having improved activity over wild-type 

20 DNA polymerase; and 

(d) recovering a DNA substrate molecule encoding an evolved 

polymerase from the products of (c). 

A further aspect of the invention is a method for evolving a DNA 
polymerase with an error rate greater than that of wild type DNA polymerase 
25 comprising: 

(a) providing a library of mutant DNA substrate molecules encoding 
mutant DNA polymerase in a host cell comprising an indicator gene having a 
reveruble mutation, wherein the indicator gene is replicated by the mutant 
DNA polymerase; 

30 fb) screening the products of (a) for revertants of the indicator gene; 
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(c) recovering mutant DNA substrate molecules from revertants; and 

(d) recovering a DNA substrate molecule encoding an evolved polymerase 
from the products of (c). 

A further aspect of the invention is a method for evolving a DNA polymerase, 

5 comprising: 

(a) providing a library of mutant DNA substrate molecules encoding mutant 
DNA polymerase, the library comprising a plasmid vector; 

(b) preparing plasmid preparations and extracts of host cells transfected with 
the products of (a); 

10 (c) amplifying each plasmid preparation in a PCR reaction using the mutant 

polymerase encoded by that plasmid, the polymerase being present in the host cell extract; 

(d) recovering the PCR products of (c); and 

(e) recovering a DNA substrate molecule encoding an evolved polymerase 
from the products of (d). 

15 A further aspect of the invention is a method for evolving a p-nitrophenol 

phosphonatase from a phosphonatase encoded by a DNA substrate molecule, comprising: 

(a) providing library of mutants of the DNA substrate molecule, the library 
comprising a plasmid expression vector, 

(b) transfecting a host, wherein the host phn operon is deleted; 

20 (c) selecting for growth of the transfectants of (b) using a p-nitrophenol 

phosphonatase as a substrate; 

(d) recovering the DNA substrate molecules from transfectants selected from 

(c); and 

(e) recovering a DNA substrate molecule from (d) encoding an evolved 
25 phosphonatase. 

A further aspect of the invention is a method for evolving a protease encoded 
by a DNA substrate molecule comprising: 

(a) providing library of mutants of the DNA substrate molecule, the library 
comprising a plasmid expression vector, wherein the DNA substrate molecule is linked to a 

30 secretory leader, 

(b) transfecting a host; 

(c) selecting for growth of the transfectants of (b) on a complex protein 

medium; and 

(d) recovering a DNA substrate molecule from (c) encoding an evolved 

35 protease. 

A further aspect of the invention is a method for screening a library of 
protease mutants displayed on a phage to obtain an improved protease, wherein a DNA 
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substrate molecule encoding the protease is fused to DNA encoding a filamentous phage 
protein to generate a fusion protein, comprising: 

(a) providing host cells expressing the fusion protein; 

(b) overlaying host cells with a protein net to entrap the phage; 

5 (c) washing the product of (b) to recover phage liberated by digestion of the 

protein net; 

(d) recovering DNA from the product of (c); and 

(e) recovering a DNA substrate from (d) encoding an improved protease. 
A further aspect of the invention is a method for screening a library of 

10 protease mutants to obtain an improved protease, the method comprising: 

(a) providing a library of peptide substrates, the peptide substrate comprising 
a fluorophore and a fluorescence quencher, 

(b) screening the library of protease mutants for ability to cleave the peptide 
substrates, wherein fluorescence is measured; and 

15 (c) recovering DNA encoding at least one protease mutant from (b). 

A further aspect of the invention is a method for evolving an alpha interferon 
gene comprising: 

(a) providing a library of mutant alpha interferon genes, the library comprising 
a filamentous phage vector; 
20 (b) stimulating cells comprising a reporter construct, the reporter construct 

comprising a reporter gene under control of an interferon responsive promoter, and wherein 
the reporter gene is GFP; 

(c) separating the cells expressing GFP by FACS; 

(d) recovering phage from the product of (c); and 

25 (e) recovering an evolved interferon gene from the product of (d). 

A further aspect of the invention is a method for screening a library of mutants 
of a DNA substrate encoding a protein for an evolved DNA substrate, comprising: 

(a) providing a library of mutants, the library comprising an expression vector; 

(b) transfeeting a mammalian host cell with the library of (a), wherein mutant 
30 protein is expressed on the surface of the cell; 

(c) screening or selecting the products of (b) with a Kgand for the protein; 

(d) recovering DNA encoding mutant protein from the products of (c); and 

(e) recovering an evolved DNA substrate from the products of (d). 

A further aspect of the invention is a method for evolving a DNA substrate 
35 molecule encoding an interferon alpha, comprising: 
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(a) providing a library of mutant alpha Interferon genes, the library comprising 
an expression vector wherein the alpha interferon genes are expressed under the control of 
an inducible promoter, 

(b) transfecting host cells with the library of (a); 

(c) contacting the product of (b) with a virus; 

(d) recovering DNA encoding a mutant alpha interferon from host cells 
surviving step (c); and 

(e) recovering an evolved interferon gene from the product of (d). 

A further aspect of the invention is a method for evolving the serum stability or 
circulation half-life of a protein encoded by a DNA substrate molecule, the DNA substrate 
molecule comprising a fusion of a DNA sequence encoding the protein with a DNA sequence 
encoding a filamentous phage protein to generate a fusion protein, the method comprising: 

(a) providing a host cell expressing a library of mutants of the fusion protein; 

(b) affinity purifying the mutants with a ligand for the protein, wherein the 
ligand is a human serum protein, tissue specific protein, or receptor; 

(c) recovering DNA encoding a mutant protein from the affinity selected 

mutants of (b); and 

(d) recovering an evolved gene encoding the protein from the product of (c). 
A further aspect of the invention is a method for evolving a protein having at 

least two subunits, comprising: 

(a) providing a library of mutant DNA substrate molecules for each subunit; 

(b) recombining the libraries into a library of single chain constructs of the 
protein, the single chain construct comprising a DNA substrate molecule encoding each 
subunit sequence, the subunit sequence being finked by a linker at a nucleic acid sequence 
encoding the amino terminus of one subunit to a nucleic add sequence encoding the 
carboxy terminus of a second subunit; 

(c) screening or selecting the products of (B), 

(d) recovering recombinant single chain construct DNA substrate molecules 
from the products of (c); 

(e) subjecting the products of (d) to mutagenesis; and 

(0 recovering an evolved single chain construct DNA substrate molecule from 

(e). 

A further aspect of the invention is a method for evolving the coupling of a 
mammalian 7-transmembrane receptor to a yeast signal transduction pathway, comprising: 

(a) expressing a library of mammalian G alpha protein mutants in a host 
cell, wherein the host cell expresses the mammalian 7-transmembrane receptor and a 
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reporter gene, the receptor gene being expressed under control of a 
pheromone responsive promoter; 

(b) screening or selecting the products of (a) for expression of the 
reporter gene in the presence of a ligand for the 7-transmembrane receptor. 

5 and 

(c) recovering DNA encoding an evolved G alpha protein mutant 
from screened or selected products of (b). 

A further aspect of the invention is a method for recombining at least a 
first and second DNA substrate molecule, comprising: 
10 (a) transfecting a host cell with at least a first and second DNA 

substrate molecule wherein the at least a first and second DNA substrate 
molecules are recombined in the host cell; 

(b) screening or selecting the products of (a) for desired property; and 

(c) recovering recombinant DNA substrate molecules from (b). 

15 Throughout this specification the word "comprise", or variations such 

as "comprises" or "comprising", will be understood to imply the inclusion of a 
stated element, integer or step, or group of elements, integers or steps, but not 
the exclusion of any other element, integer or step, or group of elements. 

20 integers or steps. 

Brief Description nf the Drawings 

Figure 1 depicts the alignment of oligo PCR primers for evolution of 
25 bovine calf intestinal alkaline phosphatase. 

Figure 2 depicts the alignment of alpha interferon amino acid and 

nucleic acid sequences. 

Figure 3 depicts the alignment of chimeric alpha interferon amino acid 

sequences. 
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nasrri ption of the Specific Em bodiments 

The invention provides a number of strategies for evolving 
polypeptides through recursive recombination methods. In some 
embodiments, the strategies of the invention can generally be classified as 
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"coarse grain shuffling" and "fine grain shuffling". As described in detail 
below, these strategies are especially applicable in situations where some 
structural or functional information is available regarding the polypeptides 
interest, where 
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the nudeic acid to be manipulated is large, when selection or screening of many 
recombinants is cumbersome, and so on. "Coarse grain shuffling" generally Involves the 
exchange or recombination of segments of nucleic adds, whether defined as functional 
domains, exons. restriction endonudease fragments, or otherwise arbitrarily defined 
segments. "Fine grain shuffling" generally involves the introduction of sequence variation 
within a segment, such as within codons. 

Coarse grain and fine grain shuffling allow analysis of variation occuring within 
a nudeic add sequence, also termed "searching of sequence space." Although both 
techniques are meritorious, the results are qualitatively different. For example, coarse grain 
searches are often better suited for optimizing multigene dusters such as polyketide 
operons. whereas fine grain searches are often optimal for optimizing a property such as 
protein expression using codon usage libraries. 

The strategies generally entail evolution of gene(s) or segment(s) thereof to 
allow retention of function in a heterologous cell or improvement of function in a homologous 
or heterologous cell. Evolution is effected generally by a process termed recursive sequence 
recombination. Recursive sequence recombination can be achieved in many different 
formats and permutations of formats, as described in further detail below. These formats 
share some common prindples. Recursive sequence recombination entails successive 
cycles of recombination to generate molecular diversity, i.e.. the creation of a family of 
nudeic add molecules showing substantial sequence identity to each other but differing in 
the presence of mutations. Each recombination cyde is followed by at least one cyde of 
screening or seledion for molecules having a desired characteristic. The molecule(s) 
selected in one round form the starting materials for generating diversity in the next round. 
In any given cyde, recombination can occur in vivo or in vitro. Furthermore, diversity 
resulting from recombination can be augmented in any cyde by applying prior methods of 
mutagenesis (e.g.. error-prone PCR or cassette mutagenesis, passage through bacterial 
mutator strains, treatment with chemical mutagens, "spiking" with sequence diversity from 
homologous gene families) to either the substrates for or products of recombination, 
t. pprmate for P«r..r«ivg Seniifinee Recombination 

Some formats and examples for recursive sequence recombination, 
sometimes referred to as DNA shuffling, evolution, or molecular breeding, have been 
described by the present inventors and co-workers in co-pending applications U.S. Patent 
Application Serial No. 067198.431. filed February 17. 1994. Serial No. PCT/US95/02126. 
filed. February 17. 1995. Serial No. 08/425.684. filed April 18. 1995, Serial No. 08/537,874, 
filed October 30, 1995, Serial No. 08/564,955. Hied November 30, 1995, Serial No. 
08/621.859! filed March 25. 1996. Serial No. 08/621.430. filed March 25. 1996. Serial No. 
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PCT/US96/05480, filed April 18, 1996, Serial No! 08/850,400, filed May 20. 1996, Serial No. 
08/675,502, filed July 3, 1996, Serial No. 08/721, 824, filed September 27, 1996, and 
08/722,660 filed September 27, 1996; Stemmer, SejSQCS 270:1510 (1995); Stemmer et al., 
Gene 164:49-53 (1995); Stemmer, Bio/Technology 13:549-553 (1995); Stemmer, Proc, Nfll. 

5 Acad Sri U.S.A . 91 :10747-10751 (1994); Stemmer, tlaima 370:389-391 (1994); Crameri et 
al., Nature Medicine 2(1):1-3 (1996); Crameri et al., Nature Biotechnology 14:315-319 (1996), 
each of which is incorporated by reference in its entirety for all purposes. 

In general, the term "gene" is used herein broadly to refer to any segment or 
sequence of DNA associated with a biological function. Genes can be obtained from a 

10 ' variety of sources, including doning from a source of interest or synthesizing from known or 
predicted sequence information, and may include sequences designed to have desired 
parameters. 

A wide variety of cell types can be used as a recipient of evolved genes. 
Cells of particular interest include many bacterial cell types, both gram-negative and gram- 

15 positive, such as Rhodococcus, Streptomycetes, Actinomycetes, Corynebacteria, 

Penicillium, Bacillus, Escherichia coli, Pseudomonas, Salmonella, and Erwinia. Cells of 
interest also include eukaryotic cells, particularly mammalian cells (e.g.. mouse, hamster, 
primate, human), both cell lines and primary cultures. Such cells include stem cells, 
including embryonic stem cells, zygotes, fibroblasts, lymphocytes, Chinese hamster ovary 

20 (CHO), mouse fibroblasts (NIH3T3), kidney, liver, muscle, and skin cells. Other eukaryotic 
cells of interest include plant cells, such as maize, rice, wheat, cotton, soybean, sugarcane, 
tobacco, and arabidopsis; fish, algae, fungi (Penicillium, Fusarium, Aspergillus, Podospora, 
Neurospora), insects, yeasts (PiccNa and Sacchammyces). 

The choice of host win depend on a number of factors, depending on the 

25 intended use of the engineered host, including pathogenicity, substrate range, environmental 
hardiness, presence of key intermediates, ease of genetic manipulation, and likelihood of 
promiscuous transfer of genetic information to other organisms. A preferred host has the 
ability to replicate vector DNA, express proteins of interest, and properly traffic proteins of 
interest. Particularly advantageous hosts are £ coli, lactobacilli, Sheptomycetes, 

30 Actinomycetes, fungi such as Saccaromyces cerivisiaa or Pischia pastoris, Schneider cells, 
L-cells, COS cells, CHO cells, and transformed B cell lines such as SP2/0, J558, NS-1 and 
AG8-653. 

The breeding procedure starts with at least two substrates that' generally 
show substantial sequence identity to each other (i.e., at least about 50%, 70%, 80% or 90% 
35 sequence identity), but differ from each other at certain positions. The difference can be any 
type of mutation, for example, substitutions, insertions and deletions. Often, different 
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segments differ from each other in perhaps 5-20 positions. For recombination to generate 
increased diversity relative to the starting materials, the starting materials must differ from 
each other in at least two nucleotide positions. That is, if there are only two substrates, there 
should be at least two divergent positions. If there are three substrates, for example, one 
substrate can differ from the second as a single position, and the second can differ from the 
third at a different single position. The starting DNA segments can be natural variants of 
each other, for example, allelic or species variants. The segments can also be from 
nonallelic genes showing some degree of structural and usually functional relatedness (e.g., 
different genes within a superfamily such as the immunoglobulin superfamily). The starting 
DNA segments can also be induced variants of each other. For example, one DNA segment 
can be produced by error-prone PCR replication of the other, or by substitution of a 
mutagenic cassette. Induced mutants can also be prepared by propagating one (or both) of 
the segments in a mutagenic strain. In these situations, strictly speaking, the second DNA 
segment is not a single segment but a large family of related segments. The different 
segments forming the starting materials are often the same length or substantially the same 
length. However, this need not be the case. For example; one segment can be a 
subsequence of another. The segments can be present as part of larger molecules, such as 
vectors, or can be in isolated form. 

The starting DNA segments are recombined by any of the recursive sequence 
recombination formats provided herein to generate a diverse library of recombinant DNA 
segments. Such a library can vary widely in size from having fewer than 10 to more than 
10 5 , 10 9 , or 10" members. In general, the starting segments and the recombinant libraries 
generated include full-length coding sequences and any essential regulatory sequences, 
such as a promoter and polyadenylation sequence, required for expression. However, if this 
is not the case, the recombinant DNA segments in the library can be inserted into a common 
vector providing the missing sequences before performing screening/selection. 

If the recursive sequence recombination format employed is an in vivo format, 
the library of recombinant DNA segments generated already exists in a cell, which is usually 
the cell type in which expression of the enzyme with altered substrate specificity is desired. 
If recursive sequence recombination is performed in vitro, the recombinant library is 
preferably introduced into the desired cell type before screening/selection. The members of 
the recombinant library can be linked to an episome or virus before introduction or can be 
introduced directly. In some embodiments of the invention, the library is amplified in a first 
host, and is then recovered from that host and introduced to a second host more amenable 
to expression, selection, or screening, or any other desirable parameter. The manner in 
which the library is introduced into the cell type depends on the DNA-uptake characteristics 
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of the cell type, e.g., having viral receptors, being capable of conjugation, or being naturally 
competent. If the cell type is insusceptible to natural and chemical-induced competence, but 
susceptible to eledroporation, one would usually employ electroporation. If the cell type is 
insusceptible to electroporation as well, one can employ biolistics. The biolistic PDS-1000 
5 Gene Gun (Biorad, Hercules, CA) uses helium pressure to accelerate DNA-coated gold or 
tungsten microcarriers toward target cells. The process is applicable to a wide range of 
tissues, including plants, bacteria, fungi, algae, intact animal tissues, tissue culture cells, and 
animal embryos. One can employ electronic pulse delivery, which is essentially a mild 
electroporation format for live tissues in animals and patients. Zhao, Advanced Dnio 

10 Delivery Reviews 17:257-262 (1995). Novel methods for making cells competent are 
described in co-pending application U.S. Patent Application Serial No. 08/621,430, filed 
March 25, 1996. After introduction of the library of recombinant DNA genes, the cells are 
optionally propagated to allow expression of genes to occur. 
A. in Vitro Formats 

1 5 One format for recursive sequence recombination utilizes a pool of related 

sequences. The sequences can be DNA or RNA and can be of various lengths depending 
on the size of the gene or DNA fragment to be recombined or reassembled. Preferably the 
sequences are from 50 bp to 100 kb. 

The pool of related substrates can be fragmented, usually at random, into 

20 fragments of from about 5 bp to 5 kb or more. Preferably the size of the random fragments 
is from about 10 bp to 1000 bp, more preferably the size of the DNA fragments is from about 
20 bp to 500 bp. The substrates can be digested by a number of different methods, such as 
DNAsel or RNAse digestion, random shearing or restriction enzyme digestion. The 
concentration of nucleic acid fragments of a particular length is often less than 0.1 % or 1% 

25 by weight of the total nucleic acid. The number of different specific nucleic acid fragments in 
the mixture is usually at least about 100, 500 or 1000. 

The mixed population of nucleic acid fragments are denatured by heating to 
about 80° C to 100° C, more preferably from 90° C to 96° C, to form single-stranded nucleic 
acid fragments. Single-stranded nucleic acid fragments having regions of sequence identity 

30 with other single-stranded nucleic add fragments can then be reannealed by cooling to 6°C 
to 75°C, and preferably from 40 °C to 65°C. Renaturation can be accelerated by the addition 
of polyethylene glycol ("PEG") or salt The salt concentration is preferably from 0 mM to 600 
mM, more preferably the salt concentration is from 10 mM to 100 mM. The salt may be such 
salts as (NH^O* KCI, or Nad. The concentration of PEG is preferably from 0% to 20%, 

35 more preferably from 5% to 10%. The fragments that reanneal can be from different 
substrates. 
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The annealed nucleic acid fragments are incubated in the presence of a 
nucleic acid polymerase, such as Taq or Wenow, Mg~ at 1mM - 20mM, and dNTFs (I.e. 
dATP, dCTP, dGTP and dTTP). If regions of sequence identity are large, Taq or other high- 
temperature polymerase can be used with an annealing temperature of between 45-85°C. If 
the areas of identity are small, Wenow or other polymerases that are active at low 
temperature can be used, with an annealing temperature of between 6-30°C. The 
polymerase can be added to the random nucleic acid fragments prior to annealing, 
simultaneously with annealing or after annealing. 

The cyde of denaturation. renaturation and incubation of random nucleic acid 
fragments in the presence of polymerase is sometimes referred to as "shuffling" of the 
nucleic acid in vitro. This cyde is repeated for a desired number of times. Preferably the 
cyde is repeated from 2 to 100 times, more preferably the sequence is repeated from 10 to 
40 times. The resulting nudeic acids are a family of double-stranded polynudeotides of from 
about 50 bp to about 100 kb. preferably from 500 bp to 50 kb. The population represents 
variants of the starting substrates showing substantial sequence Identity thereto but also 
diverging at several positions. The population has many more members than the starting 
substrates. The population of fragments resulting from recombination is preferably first 
amplified by PCR. then doned into an appropriate vector and the ligation mixture used to 

transform host cells. 

In a variation of in vitro shuffling, subsequences of recombination substrates 
can be generated by amplifying the full-length sequences under conditions which produce a 
substantial fraction, typically at least 20 percent or more, of incompletely extended 
amplification products. The amplification products, induding the incompletely extended 
amplification products are denatured and subjected to at least one additional cyde of 
reannealing and amplification. This variation, wherein at least one cyde of reannealing and 
amplification provides a substantial fraction of incompletely extended products, is termed 
"stuttering." In the subsequent amplification round, the incompletely extended products 
anneal to and prime extension on different sequence-related template spades. 

In a further variation, at least one cyde of amplification can be conduded 
using a collection of overlapping single-stranded DNA fragments of related sequence, and 
different lengths. Each fragment can hybridize to and prime polynudeotide chain extension 
of a second fragment from the collection, thus forming sequence-recombmed 
polynudeotides. In a further variation, single-stranded DNA fragments of variable length can 
be generated from a single primer by Vent DNA polymerase on a first DNA template. The 
single stranded DNA fragments are used as primers for a second, Kunkel-type template, 
consisting of a uradl-containing drcular single-stranded DNA. This results in multiple 
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substitutions of the first template into the second (see Levichkin et a!., MPl. BiQlQfflt 
29:572-577(1995)). 

Nucleic acid sequences can be recombined by recursive sequence 
recombination even if they lack sequence homology. Homology can be introduced using 

5 synthetic oligonucleotides as PCR primers. In addition to the specific sequences for the 
nucleic add segment being amplified, all of the primers used to amplify one particular 
segment are synthesized to contain an additional sequence of 20-40 bases 5' to the gene 
(sequence A) and a different 20-40 base sequence 3' to the segment (sequence B). An 
adjacent segment is amplified using a 5' primer which contains the complementary strand of 

10 sequence B (sequence B'). and a 3' primer containing a different 20-40 base sequence (C). 
Similarly, primers for the next adjacent segment contain sequences C (complementary to C) 
and D. In this way, small regions of homology are introduced, making the segments into site- 
specific recombination cassettes. Subsequent to the initial amplification of individual 
segments, the amplified segments can then be mixed and subjected to primerless PCR. 

15 When domains within a polypeptide are shuffled, It may not be possible to 

introduce additional flanking sequences to the domains, due to the constraint of maintaining 
a continuous open reading frame. Instead, groups of oligonucleotides are synthesized that 
are homologous to the 3' end of the first domain encoded by one of the genes to be shuffled, 
and the 5' ends of the second domains encoded by all of the other genes to be shuffled 

20 together. This is repeated with all domains, thus providing sequences that allow 
recombination between protein domains while maintaining their order. 
B. In Vivo Formats 

1. Plasmid-Plasm id Recombination 

The initial substrates for recombination are a collection of polynucleotides 
25 comprising variant forms of a gene. The variant forms usually show substantial sequence 
identity to each other sufficient to allow homologous recombination between substrates. The 
diversity between the polynucleotides can be natural (e.g., allelic or species variants), 
induced (e.g., error-prone PCR or error-prone recursive sequence recombination), or the 
result of in vitro recombination. Diversity can also result from resynthesizing genes encoding 
30 natural proteins with alternative codon usage. There should be at least sufficient diversity 
between substrates that recombination can generate more diverse products than there are 
starting materials. There must be at least two substrates differing in at least two positions. 
However, commonly a library of substrates of 10 3 -10 B members is employed. The degree of 
diversity depends on the length of the substrate being recombined and the extent of the 
35 functional change to be evolved. Diversity at between 0.1-25% of positions is typical. The 
diverse substrates are incorporated into plasmids. The plasmids are often standard doning 
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vectors, e.g., bacterial multicopy plasmids. However, in soma methods to be described 
below, the plasmids include mobilization (MOB) functions. The substrates can be 
incorporated into the same or different plasmids. Often at least two different types of 
plasmid having different types of selectable markers are used to allow selection for ceils 
containing at least two types of vector. Also, where different types of plasmid are employed, 
the different plasmids can come from two distinct incompatibility groups to allow stable co- 
existence of two different plasmids within the cell. Nevertheless, plasmids from the same 
incompatibility group can still co-exist within the same cell for sufficient time to allow 
homologous recombination to occur. 

Plasmids containing diverse substrates are initially introduced into cells by any 
method (e.g., chemical transformation, natural competence, eledroporation, biolistics, 
packaging into phage or viral systems). Often, the plasmids are present at or near saturating 
concentration (with respect to maximum transfection capacity) to increase the probability of 
more than one plasmid entering the same cell. The plasmids containing the various 
substrates can be transfected simultaneously or in multiple rounds. For example, in the 
latter approach cells can be transfected with a first aliquot of plasmid, transfectants selected 
and propagated, and then infected with a second aliquot of plasmid. 

Having introduced the plasmids into cells, recombination between substrates 
to generate recombinant genes occurs within cells containing multiple different plasmids 
merely by propagating the cells. However, cells that receive only one plasmid are unable to 
participate in recombination and the potential contribution of substrates on such plasmids to 
evolution is not fully exploited (although these plasmids may contribute to some extent if they 
are progagated in mutator ceils). The rate of evolution can be increased by allowing all 
substrates to participate in recombination. Such can be achieved by subjecting transfected 
cells to eledroporation. The conditions for eiectroporation are the same as those 
conventionally used for introducing exogenous DNA into cells (e.g.. 1.000-2,500 volts, 400 
pF and a 1-2 mM gap). Under these conditions, plasmids are exchanged between cells 
allowing all substrates to participate in recombination. In addition the products of 
recombination can undergo further rounds of recombination with each other or with the 
original substrate. The rate of evolution can also be increased by use of conjugative 
transfer. To exploit conjugative transfer, substrates can be cloned into plasmids having 
MOB genes, and tra genes are also provided in as or in trans to the MOB genes. The effect 
of conjugative transfer is very similar to eiectroporation in that it allows plasmids to move 
between cells and allows recombination between any substrate and the products of previous 
recombination to occur, merely by propagating the culture. The rate of evolution can also be 
increased by fusing ceils to induce exchange of plasmids or chromosomes. Fusion can be 
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induced by chemical agents, such as PEG, or viral proteins, such as influenza vims 
hemagglutinin, HSV-1 gB and gD. The rate of evolution can also be increased by use of 
mutator host cells (e.g.. Mut L, S, D, T, H in bacteria and Ataxia telangiectasia human cell 
lines). 

The time for which cells are propagated and recombination is allowed to 
occur, of course, varies with the cell type but is generally not critical, because even a small 
degree of recombination can substantially increase diversity relative to the starting materials. 
Cells bearing plasmids containing recombined genes are subject to screening or selection for 
a desired function. For example, if the substrate being evolved contains a drug resistance 
gene, one would select for drug resistance. Cells surviving screening or selection can be 
subjected to one or more rounds of screening/selection followed by recombination or can be 
subjected directly to an additional round of recombination. "Screening" as used herein is 
intended to include "selection" as a type of screen. 

The next round of recombination can be achieved by several different formats 
independently of the previous round. For example, a further round of recombination can be 
effected simply by resuming the electroporation or conjugation-mediated intercellular transfer 
of plasmids described above. Alternatively, a fresh substrate or substrates, the same or 
different from previous substrates, can be transfeeted into cells surviving 
selection/screening. Optionally, the new substrates are included in piasmid vectors bearing 
a different selective marker and/or from a different incompatibility group than the original 
plasmids. As a further alternative, cells surviving selection/screening can be subdivided into 
two subpopulations, and piasmid DNA from one subpopulation transfeeted into the other, 
where the substrates from the plasmids from the two subpopulations undergo a further round 
of recombination. In either of the latter two options, the rate of evolution can be increased by 
employing DNA extraction, electroporation, conjugation or mutator cells, as described above. 
In a still further vanation, DNA from cells surviving screening/selection can be extracted and 
subjected to in vitro recursive sequence recombination. 

After the second round of recombination, a second round of 
screening/selection is performed, preferably under conditions of increased stringency. If 
desired, further rounds of recombination and selection/screening can be performed using the 
same strategy as foi the second round. With successive rounds of recombination and 
selection/ screening, the surviving recombined substrates evolve toward acquisition of a 
desired phenotype. Typically, in this and other methods of recursive recombination, the final 
product of recombination that has acquired the desired phenotype differs from starting 
substrates at 0.1%-25% of positions and has evolved at a rate orders of magnitude in excess 
(e.g., by at least 10-fold, 100-fold, 1000-fold, or 10,000 fold) of the rate of evolution driven by 
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naturally acquired mutation of about 1 mutation per 10* positions per generation (see 
Anderson et al.. Pm Ml sri u s A - 93:906-907 (1996)). The "final product" may 
be transferred to another host more desirable for utilization of the •shuffled" DNA. This is 
particularly advantageous in situations where the more desirable host is less efficient as a 
host for the many cydes of mutation/ recombination due to the lack of molecular biology or 
genetic tools available for other organisms such as £ coli. 

2. Yjn.«-Plasmid Ry"" wblnatlon 

The strategy used for plasmid-plasmki recombination can also be used for 
virus-plasmid recombination; usually, phage-piasmid recombination. However, some 
additional comments particular to the use of viruses are appropriate. The initial substrates 
for recombination are cloned into both plasmid and viral vectors. It is usually not critical 
which substrate(s) is/are inserted into the viral vector and which into the plasmid, although 
usually the viral vector should contain different substrate(s) from the plasmid. As before, the 
plasmid (and the virus) typically contains a selective marker. The plasmid and viral vectors 
can both be introduced into cells by transfection as described above. However, a more 
efficient procedure is to transfect the cells with plasmid, select transfectants and infect the 
transfectants with vims. Because the efficiency of infection of many viruses approaches 
100% of cells, most cells transfected and infected by this route contain both a plasmid and 

virus bearing different substrates. 

Homologous recombination occurs between plasmid and virus generating 
both recombined plasmids and recombined virus. For some viruses, such as filamentous 
phage, in which intracellular DNA exists in both doubte-stranded and single-stranded forms, 
both can participate in recombination. Provided that the virus is not one that rapidly kills 
cells, recombination can be augmented by use of electroporation or conjugation to transfer 
plasmids between cells. Recombination can also be augmented for some types of virus by 
allowing the progeny virus from one cell to reinfect other cells. For some types of virus, virus 
infected-cells show resistance to superinfection. However, such resistance can be 
overcome by infecting at high multipBcrly and/or using mutant strains of the virus in which 
resistance to superinfection is reduced. 

The result of infecting plasmid-containing cells with virus depends on the 
nature of the virus. Some viruses, such as filamentous phage, stably exist with a plasmid in 
the cell and also extrude progeny phage from the cell. Other viruses, such as lambda having 
a cosmid genome, can stably exist in a cell like plasmids without producing progeny virions. 
Other viruses, such as the T-phage and lytic lambda, undergo recombination with the 
plasmid but ultimately WH the host cell and destroy plasmid DNA. For vrruses that infect cells 
without killing the host, cells containing recombinant plasmids and virus can be 



WO 98/27230 



PCT/US97/24239 



19 

screened/selected using the same approach as for plasmid-plasmid recombination. Progeny 
virus extruded by ceils surviving selection/screening can also be collected and used as 
substrates in subsequent rounds of recombination. For viruses that kill their host cells, 
recombinant genes resulting from recombination reside only in the progeny virus. If the 
5 screening or selective assay requires expression of recombinant genes in a cell, the 

recombinant genes should be transferred from the progeny virus to another vector, e.g., a 
plasmid vector, and retransfected into cells before selection/screening is performed. 

For filamentous phage, the products of recombination are present in both 
cells surviving recombination and in phage extruded from these cells. The dual source of 
10 recombinant products provides some additional options relative to the plasmid-plasmid 

recombination. For example, DNA can be isolated from phage particles for use in a round of 
in vitro recombination. Alternatively, the progeny phage can be used to transfect or infect 
cells surviving a previous round of screening/selection, or fresh cells transfected with fresh 
substrates for recombination. 
15 3. Virus-Virus Recombination 

The principles described for plasmid-plasmid and plasmid-viral recombination 
can be applied to virus-virus recombination with a few modifications. The initial substrates 
for recombination are cloned into a viral vector. Usually, the same vector is used for all 
substrates. Preferably, the virus is one that, naturally or as a result of mutation, does not kill 
20 cells. After insertion, some viral genomes can be packaged in vitro or using a packaging cell 
line. The packaged viruses are used to infect cells at high multiplicity such that there is a 
high probability that a cell will receive multiple viruses bearing different substrates. 

After the initial round of infection, subsequent steps depend on the nature of 
infection as discussed in the previous section. For example, if the viruses have phagemid 
25 (Sambrook et al., Molecular Clonino. CSH Press, 1987) genomes such as lambda cosmids 
or M13, F1 or Fd phagemids, the phagemids behave as plasmids within the cell and undergo 
recombination simply by propagating within the cells. Recombination is particularly efficient 
between single-stranded forms of intracellular DNA. Recombination can be augmented by 
electroporation of cells. 

30 Following selection/screening, cosmids containing recombinant genes can be 

recovered from surviving cells, e.g., by heat induction of a cos' lysogenic host cell, or 
extraction of DNA by standard procedures, followed by repackaging cosmid DMA in vitro. 

If the viruses are filamentous phage, recombination of replicating form DNA 
occurs by propagating the culture of infected cells. Selection/screening identifies colonies of 

35 cells containing viral vectors having recombinant genes with improved properties, together 
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with infectious particles (i.e., phage or packaged phagemids) extruded from such cells. 
Subsequent options are essentiaily the same as for plasmid-viral recombination. 
4. Chromosome Recombination 

This format can be used to especially evolve chromosomal substrates. The 
5 format is particularly preferred in situations in which many chromosomal genes contribute to 
a phenotype or one does not know the exact location of the chromosomal gene(s) to be 
evolved. The initial substrates for recombination are cloned into a plasmid vector. If the 
chromosomal gene(s) to be evolved are known, the substrates constitute a family of 
sequences showing a high degree of sequence identity but some divergence from the 

1 0 chromosomal gene. If the chromosomal genes to be evolved have not been located, the 
initial substrates usually constitute a library of DNA segments of which only a small number 
show sequence identity to the gene or gene(s) to be evolved. Divergence between plasmid- 
bome substrate and the chromosomal gene(s) can be induced by mutagenesis or by 
obtaining the plasmid-bome substrates from a different species than that of the cells bearing 

15 the chromosome. 

The piasmids bearing substrates for recombination are transfected into cells 
having chromosomal gene(s) to be evolved. Evolution can occur simply by propagating the 
culture, and can be accelerated by transferring piasmids between cells by conjugation or 
electroporation. Evolution can be further accelerated by use of mutator host cells or by 

20 seeding a culture of nonmutator host cells being evolved with mutator host cells and inducing 
intercellular transfer of piasmids by electroporation or conjugation. Preferably, mutator host 
cells used for seeding contain a negative selectable marker to facilitate isolation of a pure 
culture of the nonmutator cells being evolved. Selection/screening identifies cells bearing 
chromosomes and/or piasmids that have evolved toward acquisition of a desired function. 

25 Subsequent rounds of recombination and selection/screening proceed in 

similar fashion to those described for plasmid-plasmid recombination. For example, further 
recombination can be effected by propagating cells surviving recombination in combination 
with electroporation or conjugative transfer of piasmids. Alternatively, piasmids bearing 
additional substrates for recombination can be introduced into the surviving cells. Preferably, 

30 such piasmids are from a different incompatibility group and bear a different selective marker 
than the original piasmids to allow selection for cells containing at least two different 
piasmids. As a further alternative, plasmid and/or chromosomal DNA can be isolated from a 
subpopulation of surviving cells and transfected into a second subpopulation. Chromosomal 
DNA can be doned into a plasmid vector before transfection. 
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5. Virus-Chromosome Recombination 

As in the other methods described above, the virus is usually one that does 
not kill the cells, and is often a phage or phagemid. The procedure is substantially the same 
as for plasmid-chromosome recombination. Substrates for recombination are clonad into the 
5 vector. Vectors including the substrates can then be transfected into cells or in vitro 
packaged and introduced into cells by infection. Viral genomes recombine with host 
chromosomes merely by propagating a culture. Evolution can be accelerated by allowing 
intercellular transfer of viral genomes by electroporation, or reinfection of cells by progeny 
virions. Screening/selection identifies cells having chromosomes and/or viral genomes that 

10 have evolved toward acquisition of a desired function. 

There are several options for subsequent rounds of recombination. For 
example, viral genomes can be transferred between cells surviving selection/recombination 
by electroporation. Alternatively, viruses extruded from cells surviving selection/screening 
can be pooled and used to superinfect the cells at high multiplicity. Alternatively, fresh 

15 substrates for recombination can be introduced into the cells, either on plasmid or viral ' 
vectors. 

H. Application of Recursive Sequence Re combination to Evolution of Polypeptides 

In addition to the techniques described above, some additionally 
advantageous modifications of these techniques for the evolution of polypeptides are 
20 described below. These methods are referred to as "fine grain" and "coarse grain" shuffling. 
The coarse grain methods allow one to exchange chunks of genetic material between 
substrate nucleic acids, thereby limiting diversity in the resulting recombinants to exchanges 
or substitutions of domains, restriction fragments, oligo-encoded blocks of mutations, or 
other arbitrarily defined segments, rather than introducing diversity more randomly across the 
25 substrate. In contrast to coarse grain shuffling, fine grain shuffling methods allow the 
generation of all possible recombinations, or permutations, of a given set of very closely 
linked mutations, including multiple permutations, within a single segment, such as a codon. 

In some embodiments, coarse grain or fine grain shuffling techniques are not 
performed as exhaustive searches of all possible mutations within a nucleic acid sequence. 
30 Rather, these techniques are utilized to provide a sampling of variation possible within a 

gene based on known sequence or structural information. The size of the sample is typically 
determined by the nature of the screen or selection process. For example, when a screen is 
performed in a 96-well microliter format, it may be preferable to limit the size of the 
recombinant library to about 100 such microliter plates for convenience in screening. 



WO 98/27230 



PCT/US97/24239 



22 

The techniques described herein are especially useful in the recombination of 
genes from gene families, wherein diversity in nucleotide sequence is provided all or in part 
by naturally occurring differences in the nucleotide sequence of the genes in the family. 

A "gene family" as used herein is intended to include genes with similar 
5 function, such as but not limited to interferons or interteukins; genes which are believed to be 
derived by descent from a common ancestor; and genes which encode proteins that are 
structurally homologous, such as four helix bundle proteins. 

Thus, for example, DNA or protein sequences can be aligned by computer 
algorithms, such as those described in the monograph on bioinformatics by Schomburg and 

1 0 Lessel (Schomburg and Lessel, Bioinformatics: From Nucl eic Acids and Proteins to Cell 
Metabolism. October 9-11, 1995, Braunschweig, Germany). These algorithms can 
determine the likelihood that two sequences, or subdomains of sequences, are related to 
each other by descent from a common ancestor. Sequences that are judged to be derived 
by descent from a common ancestor comprise a "homologous gene family", and DNA 

15 shuffling can be used to accelerate the evolution of these gene families. 

Furthermore, many distinct protein sequences are consistent with similar 
protein folds, and such families of sequences can be said to comprise "structurally 
homologous" gene families. The superfamiiy of four helix bundle proteins are such a family. 
Although this is a very large family of functionally highly diverse proteins ranging from 

20 cytokines to enzymes to DNA binding proteins having this fold, it is unlikely that these 

proteins are derived from a common ancestor. It is more likely that they have "convergently 
evolved" to have similar protein folds. There are now functional algorithms (Dahiyat et al., 
Science 278:82-87 (1997)) that allow one to design proteins with desired protein folds, and 
such algorithms have been used to design, for example, zinc finger motifs that are not 

25 related in primary sequence to any known natural proteins. 

A. Use of Restriction Enzvme Sites to Rec omblne Mutations 

In some situations it is advantageous to use restriction enzyme sites in 
nucleic acids to direct the recombination of mutations in a nucleic add sequence of interest. 
These techniques are particularly preferred in the evolution of fragments that cannot readily 

30 be shuffled by existing methods due to the presence of repeated DNA or other problematic 
primary sequence motifs. They are also preferred for shuffling large fragments (typically 
greater than 10 kb), such as gene dusters that cannot be readily shuffled and "PCR- 
amplified" because of their size. Although fragments up to 50 kb have been reported to be 
amplified by PCR (Barnes, Proc. Natl. Ac ad. Sri. (U.S.A.l 91:2216-2220 (1994)), it can be 

35 problematic for fragments over 1 0 kb, and thus alternative methods for shuffling in the range 
of 10 - 50 kb and beyond are preferred. Preferably, the restriction endonudeases used are 
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of the Class II type (Sambrook et al., Molecular Cloning . CSH Press, 1987) and of these, 
preferably those which generate nonpalindromic sticky end overhangs such as Alwn I, Sfi I or 
BstX1. These enzymes generate nonpalindromic ends that allow for efficient ordered 
reassembly with DNA ligase. Typically, restriction enzyme (or endonuclease) sites are 
5 identified by conventional restriction enzyme mapping techniques (Sambrook et al., 

Molecular Cloning. CSH Press, 1987), by analysis of sequence information for that gene, or 
by introduction of desired restriction sites into a nucleic add sequence by synthesis (I.e. by 
incorporation of silent mutations). 

The DNA substrate molecules to be digested can either be from in vivo 

10 replicated DNA, such as a plasmid preparation, or from PCR amplified nucleic acid 

fragments harboring the restriction enzyme recognition sites of interest, preferably near the 
ends of the fragment Typically, at least two variants of a gene of interest, each having one 
or more mutations, are digested with at least one restriction enzyme determined to cut within 
the nucleic acid sequence of interest The restriction fragments are then joined with DNA 

1 5 ligase to generate full length genes having shuffled regions. The number of regions shuffled 
will depend on the number of cuts within the nucleic acid sequence of interest. The shuffled 
molecules can be introduced into calls as described above and screened or selected for a 
desired property. Nucleic acid can then be isolated from pools (libraries) or clones having 
desired properties and subjected to the same procedure until a desired degree of 

20 improvement is obtained. 

In some embodiments, at least one DNA substrate molecule or fragment 
thereof is isolated and subjected to mutagenesis. In some embodiments, the pool or library 
of religated restriction fragments are subjected to mutagenesis before the digestion-ligation 
process is repeated. "Mutagenesis" as used herein comprises such techniques known in the 

25 art as PCR mutagenesis, oligonudeotide-directed mutagenesis, site-directed mutagenesis, 
etc., and recursive sequence recombination by any of the techniques described herein. 

An example of the use of this format is in the manipulation of pofyketide 
clusters. Polyketide clusters (Khosla et al., TIBTpQ H 14, September 1996) are typically 10 
to 100 kb in length, specifying multiple large polypeptides which assemble into very large 

30 multienzyme complexes. Due to the modular nature of these complexes and the modular 
nature of the biosynthetic pathway, nucleic acids encoding protein modules can be 
exchanged between different polyketide clusters to generate novel and functional chimeric 
potyketides. The Introduction of rare restriction endonuclease sites such as Sfil (eight base 
recognition, nonpalindromic overhangs) at nonessential sites between polypeptides or in 

35 introns engineered within polypeptides would provide "handles" with which to manipulate 
exchange of nucleic acid segments using the technique described above. 
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B. Reassembly PCR 

A further technique for recursively recombining mutations in a nucleic add 
sequence utilizes "reassembly PCR". This method can be used to assemble multiple 
segments that have been separately evolved into a full length nucleic add template such as 
a gene. This technique is performed when a pool of advantageous mutants is known from 
previous work or has been identified by screening mutants that may have been created by 
any mutagenesis technique known in the art, such as PCR mutagenesis, cassette 
mutagenesis, doped oligo mutagenesis, chemical mutagenesis, or propagation of the DNA 
template in vivo in mutator strains. Boundaries defining segments of a nudeic add 
sequence of interest preferably lie in intergenic regions, introns, or areas of a gene not likely 
to have mutations of interest. Preferably, oligonudeotide primers (oligos) are synthesized for 
PCR amplification of segments of the nudeic add sequence of interest, such that the 
sequences of the oligonucleotides overlap the junctions of two segments. The overlap 
region is typically about 10 to 100 nucleotides in length. Each of the segments is amplified 
with a set of such primers. The PCR products are then "reassembled" according to 
assembly protocols such as those used in Sections IA-B above to assemble randomly 
fragmented genes. In brief, in an assembly protocol the PCR products are first purified away 
from the primers, by. for example, gel electrophoresis or size exdusion chromatography. 
Purified products are mixed together and subjected to about 1-10 cycles of denaturing, 
reannealing, and extension in the presence of polymerase and deoxynudeoside 
triphosphates (dNTPs) and appropriate buffer salts in the absence of additional primers 
("self-priming"). Subsequent PCR with primers flanking the gene are used to amplify the yield 
of the fully reassembled and shuffled genes. This method is necessarily "coarse grain" and 
hence only recombines mutations in a blockwise fashion, an advantage for some searches 
such as when recombining allelic variants of multiple genes within an operon. 

In some embodiments, the resulting reassembled genes are subjected to 
mutagenesis before the process is repeated. 

In some embodiments, oligonudeotides that incorporate uradl into the 
primers are used for PCR amplification. Typically uradl is incorporated at one site in the 
oligonudeotide. The products are treated with uradl glycosylase, thereby generating a 
single-stranded overhang, and are reassembled in an ordered fashion by a method such as 
disdosed by Rashtchian (Current Biolog y 6:30-36 (1995)). 

In a further embodiment, the PCR primers for amplification of segments of the 
nudeic add sequence of interest are used to introduce variation into the gene of interest as 
follows. Mutations at sites of interest in a nudeic add sequence are identified by screening 
or selection, by sequencing homologues of the nudeic add sequence, and so on. 
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Oligonucleotide PCR primers are then synthesized which encode wild type or mutant 
information at sites of interest. These primers are then used in PCR mutagenesis to 
generate libraries of full length genes encoding permutations of wild type and mutant 
information at the designated positions. This technique is typically advantagous in cases 
where the screening or selection process is expensive, cumbersome, or impractical relative 
to the cost of sequencing the genes of mutants of interest and synthesizing mutagenic 
oligonucleotides. 

An example of this method is the evolution of an improved Taq polymerase 
as described in detail below. Mutant proteins resulting from application of the method are 
identified and assayed in a sequencing reaction to identify mutants with improved 
sequencing properties. This is typically done in a high throughput format (see, for example 
Broach et al. Uaiure. 384 (Supp): 14-16 (1996)) to yield, after screening, a small number 
e.g., about 2 to 100. of candidate recombinants for further evaluation. The mutant genes 
can then be sequenced to provide information regarding the location of the mutation. The 
corresponding mutagenic oligonucleotide primers can be synthesized from this information 
and used in a reassembly reaction as described above to efficiently generate a library with an 
average of many mutations per gene. One or more rounds of this protocol allows the 
efficient search for improved variants of the Taq polymerase. 
c - Enrichment for Mutant swmenea infgrfmthn 

In some embodiments of the invention, recombination reactions, such as 
those discussed above, are enriched for mutant sequences so that the multiple mutant 
spectrum, i.e. possible combinations of mutations, is more efficiently sampled. The rationale 
for this is as follows. Assume that a number, n, of mutant clones with improved activity is 
obtained, wherein each done has a single point mutation at a different position in the nucleic 
aad sequence. If this population of mutant clones with an average of one mutation of 
interest per nucleic acid sequence is then put into a recombination reaction, the resulting 
population will still have an average of one mutation of interest per nucleic add sequence as 
defined by a Poisson distribution, leaving the multiple mutation spectrum relatively 
unpopulated. 

The amount of screening required to identify recombinants having two or 
more mutations can be dramatically reduced by the following technique. The nudeic add 
sequences of interest are obtained from a pool of mutant dones and prepared as fragments, 
typically by digestion with a restriction endbnudease , sonication, or by PCR -amplification 
The fragments are denatured, then allowed to reanneal, thereby generating mismatched 
hybrids where one strand of a mutant has hybridized with a complementary strand from a 
different mutant or wild-type done. The reannealed products are then fragmented into 
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fragments of about 20 - 100 bp. for example, by the use of DNAsel. This fragmentation 
reaction has the effect of segregating regions of the template containing mismatches (mutant 
information) from those encoding wild type sequence. The mismatched hybrids can then be 
affinity purified using aptamers, dyes, or other agents which bind to mismatched DMA. A 
preferred embodiment is the use of mutS protein affinity matrix (Wagner et a!.. Nucleic An^ 
BflSi23(19):3944-3948 (1995); Su et al.. Proc. Nail Arari Sri (USA) 83:5057- 
5061(1986)) with a preferred step of amplifying the affinity-purified material in vitro prior to an 
assembly reaction. This amplified material is then put into a assembly PCR reaction as 
decribed above. Optionally, this material can be titrated against the original mutant pool 
(e.g.. from about 100% to 10% of the mutS enriched pool) to control the average number of 
mutations per clone in the next round of recombination. 

Another application of this method Is in the assembly of gene constructs that 
are enriched for polymorphic bases occurring as natural or selected allelic variants or as 
differences between homologous genes of related species. For example, one may have 
several varieties of a plant that are believed to have heritable variation in a trait of interest 
(e.g.. drought resistance). It then is of interest to construct a library of these variant genes 
containing many mutations per gene. MutS selection can be applied in combination with the 
assembly techniques described herein to generate such a pool of recombinants that are 
highly enriched for polymorphic (-mutant") information. In some embodiments, the pool of 
recombinant genes is provided in a transgenic host Recombinants can be further evolved 
by PCR amplification of the transgene from transgenic organisms that are determined to 
have an improved phenotype and applying the formats described in this invention to further 
evolve them. 

D Intron-driven Recombination 

In some instances, the substrate molecules for recombination have uniformly 
low homology, sporadically distributed regions of homology, or the region of homology is 
relatively small (for example, about 10-100 bp), such as phage displayed peptide ligands. 
These factors can reduce the efficiency and randomness of recombination in RSR. In some 
embodiments of the invention, this problem is addressed by the introduction of introns 
between coding exons in sequences encoding protein homologues. In further embodiments 
of the invention, introns can be used (Chong et al.. A^CJjejiL. 271:22159-22168 (1996)). 

In this method, a nudeie add sequence, such as a gene or gene family, is 
arbitrarily defined to have segments. The segments are preferably exons. Introns are 
engineered between the segments. Preferably, the intron inserted between the first and 
second segments is at least about 10% divergent from the intron inserted between second 
and third segments, the intron inserted between second and third segments is at least about 
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10% divergent from the introns inserted between any of the previous segment pairs and so 
on through segments n and n + 1. The introns between any given set of exons will thus 
initially be identical between all clones in the library, whereas the exons can be arbitrarily 
divergent in sequence. The introns therefore provide homologous DNA sequences that will 
permit application of any of the described methods for RSR while the exons can be arbitrarily 
small or divergent in sequence, and can evolve to achieve an arbitrarily large degree of 
sequence divergence without a significant loss In efficiency in recombination. Restriction 
sites can also be engineered into the intronic nucleic acid sequence of interest so as to allow 
a directed reassemmbly of restriction fragments. The starting axon DNA may be synthesized 
de novo from sequence information, or may be present in any nucleic acid preparation ( e g 
genomic, cDNA, libraries, and so on). For example, 1 to 10 nonhomologous introns can be ' 
des.gned to direct recombination of the nucleic add sequences of interest by placing them 
between exons. The sequence of the introns can be all or partly obtained from known intron 
sequence. Preferably, the introns are self-splicing. Ordered sets of introns and exon 
libraries are assembled into functional genes by standard methods (Sambrook et al., 
Mfil££ylail£lfiQina, CSH Press (1987)). 

Any of the formats for in vitro or in vivo recombination described herein can 
be applied for recursive exon shuffling. A preferred format is to use nonpalindromic 
restriction sites such as Sfi I placed into the intronic sequences to promote shuffling Pools 
of selected clones are digested with Sfi I and religated. The nonpalindromic overhangs 
promote ordered reassembly of the shuffled exons. These libraries of genes can be 
expressed and screened for desired properties, then subjected to further recursive rounds of 
recombmation by this process. In some embodiments, the libraries are subjected to 
mutagenesis before the process is repeated. 

An example of how the introduction of an intron into a mammalian library 
format would be used advantageously is as follows. An intron containing a lox (Sauer et al 
PrecNatl . Acfldftq-n^A) 85:5166 - 5 170 (1988)) site is arbitrarily introduced between ' 
amino adds 92 and 93 in each alpha interferon parental substrate. A library of 10* chimeric 
mterferon genes is made for each of the two exons (residues 1-92 and residues 93-167) 
doned into a replicating plasm.d vector, and introduced into target cells. The number 10< is 
arbrtrarily chosen for convenience in screening. An exemplary vedor for expression in 
mammalian cells would contain an SV40 origin, with the host cells expressing SV40 large T 
antigen, so as to allow transient expression of the interferon constructs. The cells are 
challenged with a cytopathic virus such as vesicular stomatitis virus (VSV) in an interferon 
protection assay (e.g., Meister et al., J L£e JL MaL 67:1633-1643, (1986)). Cells surviving 
due to expression of interferon are recovered, the two libraries of interferon genes are PCR 
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amplified, and redoned into a vector that can be amplified in £ coli. The amplified plasmids 
are then transfected at high multiplicity (e.g. 10 micrograms of plasmid per 10* cells) into a 
ere expressing host that can support replication of that vector. The presence of ere in the 
host cells promotes efficient recombination at the lox site in the interferon intron, thus 
5 shuffling the selected sets of exons. This population of cells is then used in a second round 
of selection by viral challenge and the process is applied recursively. In this format, the ere 
recombinase is preferably expressed transiently on a cotransfected molecule that cannot 
replicate in the host. Thus, after segregation of recombinants from the ere expressing 
plasmid, no further recombination will occur and selection can be performed on genetically 
10 stable exon permutations. The method can be used with more than one intron, with 

recombination enhancing sequences other than cre/lox (e.g., irrt/xis, etc), and with other 
vector systems such as but not limited to retroviruses, adenovirus or adeno- associated 
virus. 

5. Synthetic Oligonucleotide Mediated Recombination 

15 1. Olioo bridoe across sequence space 

In some embodiments of the invention, a search of a region of sequence 
space defined by a set of substrates, such as members of a gene family, having less than 
about 80%, more typically, less than about 50% homology, is desired. This region, which 
can be part or all of a gene or a gene is arbitrarily delineated into segments. The segment 

20 borders can be chosen randomly, based on correspondence with natural exons, based on 
structural considerations (loops, alpha helices, subdomains, whole domains, hydrophobic 
core, surface, dynamic simulations), and based on correlations with genetic mapping data. 

Typically, the segments are then amplified by PCR with a pool of "bridge" 
oligonucleotides at each junction. Thus, if the set of five genes is broken into three 

25 segments A, B and C, and if there are five versions of each segment (A1, A2, ... 04, C5), 
twenty five oligonucleotides are made for each strand of the A-B junctions where each bridge 
oligo has 20 bases of homology to one of the A and one of the B segments. In some cases, 
the number of required oligonucleotides can be reduced by choosing segment boundaries 
that are identical in some or all of the gene family members. Oligonucleotides are similarly 

30 synthesized for the B-C junction. The family of A domains is amplified by PCR with an 
outside generic A primer and the pool of A-B junction oligonucleotides; the B domains with 
the A-B plus the B-C. bridge oligonucleotides, and the C domains with the B-C bridge 
oligonucleotides plus a generic outside primer. Full length genes are made then made by 
assembly PCR or by the dUTP/uracil glycosylate methods described above. Preferably, 

35 products from this step are subjected to mutagenesis before the process of selection and 
recombination is repeated, until a desired level of improvement or the evolution of a desired 
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property is obtained. This is typically determined using a screening or selection as 
appropriate for the protein and property of interest. 

An illustration of this method is illustrated below for the recombination of 
eleven homologous human alpha interferon genes. 
5 2. Site Directed Mutagenesis /SDMI with Oligonucleotide* EnrHWnfl 

Homolooue Mutations Followed bv Shuffling 
In some embodiments of the invention, sequence information from one or 
more substrate sequences is added to a given "parental" sequence of interest, with 
subsequent recombination between rounds of screening or selection. Typically, this is done 
10 with site-directed mutagenesis performed by techniques well known in the art (Sambrook et 
al., Molecular Cloning. CSH Press (1987)) with one substrate as template and 
oligonucleotides encoding single or multiple mutations from other substrate sequences, e.g. 
homologous genes. After screening or selection for an improved phenotype of interest, the 
selected recombinant(s) can be further evolved using RSR techniques described herein. 
15 After screening or selection, site-directed mutagenesis can be done again with another 
collection of oligonucleotides encoding homologue mutations, and the above process 
repeated until the desired properties are obtained. 

When the difference between two homologues is one or more single point 
mutations in a codon, degenerate oligonucleotides can be used that encode the sequences 
20 in both homologues. One oligo may include many such degenerate codons and still allow 
one to exhaustively search all permutations over that block of sequence. An example of this 
is provided below for the evolution of alpha interferon genes. 

When the homologue sequence space is very large, it can be advantageous 
to restrict the search to certain variants. Thus, for example, computer modelling tools 
25 (Lathrbp et al., J. Mol. Biol.. 255:641-665 (1996)) can be used to model each homologue 
mutation onto the target protein and discard any mutations that are predicted to grossly 
disrupt structure and function. 

F. Recombination Directed bv Host Machinery 

In some embodiments of the invention, DNA substrate molecules are 
30 introduced into cells, wherein the cellular machinery directs their recombination. For 
example, a library of mutants is constructed and screened or selected for mutants with 
improved phenotypes by any of the techniques described herein. The DNA substrate 
molecules encoding the best candidates are recovered by any of the techniques described 
herein, then fragmented and used to transfect a mammalian host and screened or selected 
35 for improved function. The DNA substrate molecules are recovered from the mammalian 
host, such as by PCR, and the process is repeated until a desired level of improvement is 
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obtained. In some embodiments, the fragments are denatured and reannealed prior to 
transection, coated with recombination stimulating proteins such as recA, or co-transfected 
with a selectable marker such as Neo" to allow the positive selection for cells receiving 
recombined versions of the gene of interest 
5 For example, this format is preferred for the in vivo affinity maturation of an 

antibody by RSR. In brief, a library of mutant antibodies is generated, as described herein 
for the 48G7 affinity maturation. This library is FACS purified with Dgand to enrich for 
antibodies with the highest 0.1 - 10% affinity. The V regions genes are recovered by PCR, 
fragmented, and cotransfected or electorporated with a vector into which reassembled V 

10 region genes can recombine. DNA substrate molecules are recovered from the cotranfected 
cells, and the process is repeated until the desired level of improvment is obtained. Other 
embodiments indude reassembling the V regions prior to the electroporation so that an intact 
V region exon can recombine into an antibody expression cassette. Further embodiments 
include the use of this format for other eukarybtic genes or for the evolution of whole viruses. 

15 G. Phaaamld-Based Assembly 

In some embodiments of the invention, a gene of interest is cloned into a 
vector that generates single stranded DNA, such as a phagemid. The resulting DNA 
substrate is mutagenzied by RSR in any method known in the art, transfected into host cells, 
and subjected to a screen or selection for a desired property or improved phenotype. DNA 

20 from the selected or screened phagemids is amplified, by, for example, PCR or piasmid 
preparation. This DNA preparation contains the various mutant sequences that one wishes 
to permute. This DNA is fragmented and denatured, and annealed with single-stranded DNA 
(ssDNA) phagemid template (ssDNA encoding the wild-type gene and vector sequences). A 
preferred embodiment is the use of dut(-) ung(-) host strains such as CJ236 (Sambrook et 

25 al., Molecular Clonino CSH Press (1987)) for the preparation of ssDNA. 

Gaps in annealed template are filled with DNA polymerase and ligated to form 
closed relaxed circles. Since multiple fragments can anneal to the phagemid, the newly 
synthesized strand now consists of shuffled sequences. These products are transformed 
into a mutS strain of E coli which is dut+ ung+. Phagemid DNA is recovered from the 

30 transfected host and subjected again to this protocol until the desired level of improvement is 
obtained. The gene encoding the protein of interest in this library of recovered phagemid 
DNA can be mutagenzied by any technique, including RSR, before the process is repeated. 
III. Improved Protein Expression 

While recombinant DNA technology has proved to be a very general method 

35 for obtaining large, pure, and homogeneous quantities of almost all nucleic acid sequences 
of interest, similar generality has not yet been achieved for the production of large amounts 
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of pure, homogeneous protein in recombinant form. A likely explanation is that protein 
expression, folding, localization and stability is intrinsically more complex and unpredictable 
than for DNA. The yield of expressed protein is a complex function of transcription rates, 
translation rates, interactions with the ribosome, interaction of the nascent polypeptide with 
chaperonins and other proteins in the cell, efficiency of oligomerization. interaction with 
components of secretion and other protein trafficking pathways, protease sensitivity, and the 
intrinsic stability of the final folded state. Optimization of such complex processes is well 
suited for the application of RSR. The following methods detail strategies for application of 
RSR to the optimization of protein expression. 

A. Evolution of Mutant Genes with Imnmw d Expression Uslno RSR o n 
Codon Usage Libraries 

The negative effect of rare E. coli codons on expression of recombinant 
proteins in this host has been clearly demonstrated (Rosenberg, et a!., J. Bact T 175:716-722 
(1993)). However, general rules for the choice of codon usage patterns to optimize 
expression of functional protein have been elusive. In some embodiments of the invention, 
protein expression is optimized by changing codons used in the gene of interest, based on 
the degeneracy of the genetic code. Typically, this is accomplished by synthesizing the gene 
using degenerate oligonucleotides. In some embodiments the degenerate oligonucleotides 
have the general structure of about 20 nucleotides of identity to a DNA substrate molecule 
encoding a protein of interest, followed by a region of about 20 degenerate nucleotides which 
encode a region of the protein, followed by another region of about 20 nucleotides of identity. 
In a preferred embodiment, the region of identity utilizes preferred codons for the host. In a 
further embodiment, the oligonucleotides are identical to the DNA substrate at least one 5' 
and one 3* nucleotide, but have at least 85% sequence homology to the DNA substrate 
molecule, with the difference due to the use of degenerate codons. In some embodiments, a 
set of such degenerate oligonucleotides is used in which each oligonucleotide overtaps with 
another by the general formula n - 10, wherein n is the length of the oligonucleotide. Such 
oligonucleotides are typically about 20-1000 nucleotides in length. The assembled genes 
are then cloned, expressed, and screened or selected for improved expression. The 
assembled genes can be subjected to recursive recombination methods as descibed above 
until the desired improvement is achieved. 

For example, this technique can be used to evolve bovine intestinal alkaline 
phosphatase (BIAP) for active expression in Ejcojj. This enzyme is commonly used as a 
reporter gene in assay formats such as EUSA. The cloned gene cannot be expressed in 
active form in a prokaryotic host such as E co/r in good yield. Development of such an 
expression system would allow one to access inexpensive expression technology for BIAP 
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and, importantly, for engineered variants with improved activity or chemical coupling 
properties (such as chemical coupling to antibodies). A detailed example is provide in the 
Experimental Examples section. 

B. Improved Folding 
5 In some embodiments of the invention, proteins of interest when 

overexpressed or expressed in heterologous hosts form inclusion bodies, with the majority of 
the expressed protein being found in insoluble aggregates. Recursive sequence 
recombination techniques can be used to optimize folding of such target proteins. There are 
several ways to improve folding, including mutating evolving the target protein of interest and 
10 evolving chaperonin proteins. 

1. Evolving A Target Protein 

a. Inclusion Body Fractionation Selection Using lac 
Headpiece Dimer Fusion Protein 
The lac repressor "headpiece dimer" is a small protein containing two 
1 5 headpiece domains connected by a short peptide linker which binds the lac operator with 
sufficient affinity that polypeptide fusions to this headpiece dimer will remain bound to the 
plasmid that encodes them throughout an affinity purification process (Gates et al., J. Mol. 
BJqL 255:373-386 (1995)). This property can be exploited, as follows, to evolve mutant 
proteins of interest with improved folding properties. The protein of interest can be 
20 mammalian, yeast, bacterial, etc. 

A fusion protein between the lac headpiece dimer and a target protein 
sequence is constructed, for example, as disclosed by Gates (supra). This construct, 
containing at least one lac operator, is mutagenized by technologies common in the arts 
such as PCR mutagenesis, chemical mutagenesis, oligo directed mutagenesis (Sambrook et 
25 al., Molecular Clonino CSH Press (1987)). The resulting library is transformed into a host 
cell, and expression of the fusion protein is induced, preferably with arabinose. An extract or 
lysate is generated from a culture of the library expressing the construct. Insoluble protein is 
fractionated from soluble protein/ONA complexes by centrifugation or affinity 
chromatography, and the yield of soluble protein/DNA complexes is quantrtated by 
30 quantitative PCR (Sambrook et al., Molecular Cloning. CSH Press, 1987) of the plasmid. 
Preferably, a reagent that is specific for properly folded protein, such as a monoclonal 
antibody or a natural ligand, is used to purify soluble protein/DNA complexes. The plasmid 
DNA from this step is isolated, subjected to RSR and again expressed. These steps are 
repeated until the yield of soluble protein/DNA complexes has reached a desired level of 
35 improvement. Individual clones are then screened for retention of functional properties of the 
protein of interest, such as enzymatic activity, etc 
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This technique is generically useful for evolving solubility and other properties 
such as cellular trafficking of proteins heterologously expressed in a host cell of interest For 
example, one could select for efficient folding and nuclear localization of a protein fused to 
the lac repressor headpiece dimer by encoding the protein on a plasmid encoding an SV40 
origin of replication and a lac operator, and transiently expressing the fusion protein in a 
mammalian host expressing T antigen. Purification of protein/DNA complexes from nuclear 
HIRT extracts (Seed and Aruffo, Proc. Natl Aca d. Sd. fU S A v 84:3365-3369 (1987)) would 
allow one to select for efficient folding and nuclear localization proteins. 

b. Functional Expression of Protein Using Phage Display 

A problem often encountered in phage display methods such as those 
disclosed by O'Neil et al. {Current Bioloov. 5:443-449 (1995)) is the inability to functionally 
express a protein of interest on phage. Without being limited to any one theory, improper 
folding of the protein of interest can be responsible for this problem. RSR can be used to 
evolve a protein of interest for functional expression on phage. Typically, a fusion protein is 
constructed between gene III or gene VIII and the target protein and then mutagenized, for 
example by PCR mutagenesis. The mutagenzied library is then expressed in a phage 
display format, a phage stock is made, and these phage are affinity selected for those 
bearing functionally displayed fusion proteins using an affinity matrix containing a known 
ligand for the target protein. DNA from the functionally selected phage is purified, and the 
displayed genes of interest are shuffled and redoned into the phage display format. The 
selection, shuffling and recloning steps are repeated until the yield of phage with functional 
displayed protein has reached desired levels as defined, for example, by the fraction of 
phage that are retained on a ligand affinity matrix or the biological activity associated with the 
displayed phage. Individual clones are then screened to identify candidate mutants with 
improved display properties, desired level of expression, and functional properties of interest 
(e.g., ability to bind a ligand or receptor, lymphokine activity, enzymatic activity, etc.). 

In some embodiments of the invention, a functional screen or selection is 
used to identify an evolved protein not expressed on a phage. The target protein, which 
cannot initially be efficiently expressed in a host of interest, is mutagenized and a functional 
screen or selection is used to identify cells expressing functional protein. For example, the 
protein of interest may complement a function in the host cell, cleave a colorimetric 
substrate, etc. Recursive sequence recombination is then used to rapidly evolve improved 
functional expression from such a pool of improved mutants. 

For example, AMV reverse transcriptase is of particular commercial 
importance because it is active at a higher temperature (42° C) and is more robust than 
many other reverse transcriptases. However, it is difficult to express in prokaryotic hosts 
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such as £ co//, and is consequently expensive because it has to be purified from chicken 
cells. Thus an evolved AMV reverse transcriptase that can be expressed efficiently in E coff 
is highly desirable. 

In brief, the AMV reverse transcriptase gene (Papas et al., J. Cellular 
5 Qiochem 20:95-103 (1982)) is mutagenized by any method common in the art. The library of 
mutant genes is doned into a colE1 plasmid (Amp resistant) under control of the lac 
promoter in a poIA12 (Ts) recA718 (Sweasy et al. Proc. Natl. Acad. Sd. U.S.A. 90:4626- 
4630 (1993)) E. cofi host The library is induced with IPTG, and shifted to the nonpermissive 
temperature. This selects for functionally expressed reverse transcriptase genes under the 

10 selective conditions reported for selection of active HIV reverse transcriptase mutants 
reported by Kim et al. fProc. Natl. Acad. Sri. (U.S.A.V 92:684-688 (1995)). The selected 
AMV RTX genes are recovered by PCR by using oligonucleotides flanking the cloned gene. 
The resulting PCR products are subjected to in vitro RSR, selected as described above, and 
the process is repeated until the level of functional expression is acceptable. Individual 

1 5 clones are then screened for RNA-dependent DNA polymerization and other properties of 
interest (e.g. half life at room temperature, error rate). The candidate clones are subjected to 
mutagenesis, and then tested again to yield an AMV RT that can be expressed in £. coff at 
high levels. 

2. Evolved Chaoeronins 

20 In some embodiments of the invention, overexpression of a protein can lead 

to the accumulation of folding intermediates which have a tendency to aggregate. Without 
being limited to any one theory, the role of chaoeronins is thought to be to stabilize such 
folding intermediates against aggregration; thus, overexpression of a protein of interest can 
lead to overwhelming the capacity of chaperonins. Chaperonin genes can be evolved using 

25 the techniques of the invention, either alone or in combination with the genes encoding the 
protein of interest, to overcome this problem. 

Examples of proteins of interest which are especially suited to this approach 
include but are not limited to: cytokines; malarial coat proteins; T cell receptors; antibodies; 
industrial enzymes (e.g., detergent proteases and detergent lipases); viral proteins for use in 

30 vaccines; and plant seed storage proteins. 

Sources of chaperonin genes include but are limited to £. co// chaperonin 
genes encoding such proteins as thioredoxin, Gro ES/Gro EL, PapO, CIpB, DsbA, DsbB, 
DnaJ, DnaK, and GrpE; mammalian chaperonins such as Hsp70, Hsp72, Hsp73, 
Hsp40,Hsp60, Hsp10, Hdj1. TCP-1, Cpn60, BiP; and the homologues of these chaperonin 

35 genes in other species such as yeast (J.G. Wall and A Pluckthun, Current Biology. 6:507- 
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516 (1995); Hart), UatUR. 381:571-580 (1996)). Additionally, heterologous genomic or cDNA 
libraries can be used as libraries to select or screen for novel chaperonins. 

In general, evolution of chaperonins is accomplished by first mutagenizing 
chaperonin genes, screening or selecting for improved expression of the target protein of 
5 interest, subjecting the mutated chaperonin genes to RSR, and repeating selection or 
screening. As with all RSR techniques, this is repeated until the desired improvement of 
expression of the protein of interest is obtained. Two exemplary approaches are provide 
below. 

a. Chaneronm Evolution in Trans to the Prot ein of intend 
10 With a Screen or Selection for Improved Fu ne ti« n 

In some embodiments the chaperonin genes are evolved independently of the 
gene(s) for the protein of interest The improvement in the evolved chaperonin can be 
assayed, for example, by screening for enhancement of the activity of the target protein itself 
or for the activity of a fusion protein comprising the target protein and a selectable or 
15 screenable protein (e.g., GFP, alkaline phosphatase or beta-galactosidase). 

b. Chaperonin Onaron in cis 

In some embodiments, the chaperonin genes and the target protein genes are 
encoded on the same plasmid, but not necessarily evolved together. For example, a lac 
headpiece dimer can be fused to the protein target to allow for selection of plasmids which 

20 encode soluble protein. Chaperonin genes are provided on this same plasmid ("cis") and are 
shuffled and evolved rather than the target protein. Similarly, the chaperonin genes can be 
cloned onto a phagemid plasmid that encodes a gene III or gene VIII fusion with a protein of 
interest. The cloned chaperonins are mutagenized and, as with the selection described 
above, phage expressing functionally displayed fusion protein are isolated on an affinity 

25 matrix. The chaperonin genes from these phage are shuffled and the cycle of selection, 
mutation and recombination are applied recursively until fusion proteins are efficiently 
displayed in functional form. 

3. Improved Intracellular Lncajfraffan 

Many overexpressed proteins of biotechnologies! interest are secreted into 
30 the periplasm or media to give advantages in purification or activity assays. Optimization for 
high level secretion is difficult because the process is controlled by many genes and hence 
optimization may require multiple mutations affecting the expression level and structure of 
several of these components. Protein secretion in E co//, for example, is known to be 
influenced by many proteins including: a secretory ATPase (SecA), a translocase complex 
35 (SecB, SecD. SecE, SecF, and SecY), chaperonins (DnaK, DnaJ, GroES, GroEL), signal 
peptidases (LepB, LspA, Ppp), specific folding catalysts (DsbA) and other proteins of less 
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well defined function (e.g., Ffh, FlsY) (Sandkvist et al., Curr. Op. Bjotechnot 7:505-511 
(1996)). Overproduction of wild type or mutant copies of these genes for these proteins can 
significantly increase the yield of mature secreted protein. For example, overexpression of 
secY or secY4 significantly increased the periplasmtc yield of mature human I LB from a hlL6- 
5 pre-OmpA fusion (Perez-Perez et al., Bio-Technoloov 1 2: 1 78-1 80 (1 994)). Analogously, 
overexpression of DnaK/DnaJ in £ coff improved the yield of secreted human granulocyte 
colony stimulating factor (Perez-Perez et al., Biochem. Bion hvs. Res. Commun 210:254-259 
(1995)). 

RSR provides a route to evolution of one or more of the above named 
10 components of the secretory pathway. The following strategy is employed to optimize protein 
secretion in E coff. Variations on this method, suitable for application to Bacillus subtHis, 
Pseudomonas, Saccaromyces cenvisias, Pichia pastoris, mammalian cells and other hosts 
are also described. The general protocol is as follows. 

One or more of the genes named above are obtained by PCR amplification 
15 from E caff genomic DNA using known flanking sequence, and cloned in an ordered array 
into a plasmid or cosmid vector. These genes do not in general occur naturally in clusters, 
and hence these will comprise artificial gene clusters. The genes may be doned under the 
control of their natural promoter or under the control of another promoter such as the lac, tac, 
arabinose, or trp promoters. Typically, rare restriction sites such as Sfi I are placed between 
20 the genes to facilitate ordered reassembly of shuffled genes as described in the methods of 
the invention. 

The gene cluster is mutagenized and introduced into a host cell in which the 
gene of interest can be indudbly expressed. Expression of the target gene to be secreted 
and of the cloned genes is induced by standard methods for the promoter of interest (e.g., 

25 addition of 1 mM IPTG for the lac promoter). The efficiency of protein secretion by a library 
of mutants is measured, for example by the method of colony blotting (Skerra et al., Anal. 
BiQChem. 196:151-155 (1991)). Those colonies expressing the highest levels of secreted 
protein (the top 0.1 - 10%; preferably the top 1%) are picked. Plasmid DNA is prepared from 
these colonies and shuffled according to any of the methods of the invention. 

30 Preferably, each individual gene is amplified from the population and 

subjected to RSR. The fragments are digested with Sfi I (introduced between each gene 
with nonpalindromic overhangs designed to promote ordered reassembly by DNA ligase) and 
ligated together, preferably at low dilution to promote formation of covalently closed relaxed 
cirdes (<1 ng/microliter). Each of the PCR amplified gene populations may be shuffled prior 

35 to reassembly into the final gene duster. The ligation products are transformed back into the 
host of interest and the cyde of selection and RSR is repeated. 
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Analogous strategies can be employed in other hosts such as Pseudomonas, 
Bacillus subtilis, yeast and mammalian cells. The homologs of (he £ eo/i genes listed above 
are targets for optimization, and indeed many of these homologs have been identified in 
other species (Pugsley, Micfpb, Rev. 57:50-108 (1993)). In addition to these homologs, 
other components such as the six polypeptides of the signal recognition particle, the trans- 
locating chain-associating membrane protein (TRAM), BiP, the Ssa proteins and other hsp70 
homologs, and prsA (S. subtilis) (Simonen and Pirtva, Mlcrob Rav 57:109-137 (1993)) are 
targets for optimization by RSR. In general, replicating episomal vectors such as SV40-neo 
(Sambrook et al., Molecular Clonino. CSH Press (1987), Northrup et al., J. Biol. Chem 
268(4):291 7-2923 (1993)) for mammalian cells or 2 micron or ars plasmids for yeast 
(Strathem et al., The Molecular Biology of the Veast Saoam^ CSH Press (1982)) are 
used. Integrative vectors such as pJM 103. pJM 113 or pSGMU2 are preferred for B. subtilis 
(Perego, Chap. 42, pp. 615-624 in: Bacillus subtilis and Other ftra m-PoaitivB Rar^rfa A . 
Sonenshein, J. Hoch, and R. Losick, eds., 1993). 

For example, an efficiently secreted thermostable DNA polymerase can be 
evolved, thus allowing the performance of DNA polymerization assays with little or no 
purification of the expressed DNA polymerase. Such a procedure would be preferred for the 
expression of libraries of mutants of any protein that one wished to test in a high throughput 
assay, for example any of the pharmaceutical proteins listed in Table I, or any industrial 
enzyme. Initial constructs are made by fusing a signal peptide such as that from STII or 
OmpA to the amino terminus of the protein to be secreted. A gene cluster of cloned genes 
believed to act in the secretory pathway of interest are mutagenized and coexpressed with 
the target construct. Individual clones are screened for expresion of the gene product. The 
secretory gene dusters from improved clones are recovered and recloned and introduced 
back into the original host Preferably, they are first subjected to mutagenesis before the 
process is repeated. This cycle is repeated until the desired improvement in expression of 
secreted protein is achieved. 
IV. Evolved Polypeptide Pmimrttes 

A- Evolved Transition State Analog and Substrata Binding 

There are many enzymes of industrial interest that have substantially 
suboptimal activity on the substrate of interest. In many of these cases, the enzyme 
obtained from nature is required to work either under conditions that are very different from 
the conditions under which it evolved or to have activity towards a substrate that is different 
from the natural substrate. 

The application of evolutionary technologies to industrial enzymes is often 
significantly limited by the types of selections that can be applied and the modest numbers of 
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mutants that can be sun/eyed in screens. Selection of enzymes or catalytic antibodies, 
expressed in a display format, for binding to transition state analogs (McCafferty et al., Appl. 
Bioehem Biotechnni 47:137.171 (1994)) or substrate analogs (Janda et al., Pree fMg fl, 
Acad. Set, (U.S.A ) 91:2532-2536, (1994)) represents a general strategy for selecting for 
mutants with with improved catalytic efficiency. 

Phage display (O'Neil et al., Current Biology 5:443-449 (1995) and the other 
display formats (Gates et al., J. Mol. Biol. 255:373-386 (1995); Mattheakis et al., Proe Natl 
Acad. Sg. (U,g,A.) 91:9022-9026 (1994)) described herein represent general methodologies 
for applying affinity-based selections to proteins of interest. For example, Matthews and 
Wells (ScJeosa 260:1 1 13-1 1 17 (1993)) have used phage display of a protease substrate to 
select improved substrates. Display of active enzymes on the surface of phage, on the other 
hand, allows selection of mutant proteins with improved transition state analog binding. 
Improvements in affinity for transition state analogs correlate with improvements in catalytic 
efficiency. For example, Patten et al., Science 271:1086-1091 (1996) have shown that 
improvements in affinity of a catalytic antibody for its hapten are well correlated with 
improvements in catalytic efficiency, with an 80-fold improvement in kcat/Km being achieved 
for an esterolytic antibody. 

For example, an enzyme used in antibiotic biosynthesis can be evolved for 
new substrate specificity and activity under desired conditions using phage display 
selections. Some antibiotics are currently made by chemical modifications of biologically 
produced starting compounds. Complete biosynthesis of the desired molecules is currently 
impractical because of the lack of an enzyme with the required enzymatic activity and 
substrate specificity (Skatrud, TIBTECH 10:324-329, September 1992). For example, 7- 
aminodeacetooxycephalosporanic acid (7-ADCA) is a precursor for semi-synthetically 
produced cephalosporins. 7-ADCA is made by a chemical ring expansion of penicillin G 
followed by enzymatic deacytation of the phenoxyacetal group. 7-ADCA can be made 
enzymatically from deacetylcephalosporin C (DAOC V), which could in turn be derived from 
penicillin V by enzymatic ring expansion if a suitably modified penicillin expandase could be 
evolved (Cantwell et al., Curr. Genet 17213-221 (1990)). Thus, 7-ADCA could in principle 
be produced enzymatically from penicillin V using a modified penicillin N expandase, such as 
mutant forms of the S. davuligerus cefE gene (Skatrud, TIBTECH 10:324-329, September 
1992). However, penicillin V is not accepted as a substrate by any known expandase with 
sufficient efficiency to be commercially useful. As outlined below, RSR techniques of the 
invention can be used to evolve the penicillin expandase encoded by cefE or other 
expandases so that they will use penicillin V as a substrate. 
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Phage display or other display format selections are applied to this problem by 
expressing libraries of cefE penicillin expandase mutants in a display format, selecting for 
binding to substrates or transition state analogs, and applying RSR to rapidly evolve high 
affinity binders. Candidates are further screened to identify mutants with improved 
5 enzymatic activity on penicillin V under desired reaction conditions, such as pH, temperature, 
solvent concentration, etc. RSR is applied to further evolve mutants with the desired 
expandase activity. A number of transition state analogs (TSA's) are suitable for this 
reaction. The following structure is the initial TSA that is used for selection of the display 
library of cefE mutants: 




CO3H 



10 Libraries of the known penicillin expandases (Skatrud, TIBTECH 10:324- 

329(1992); Cantweil et a).. Curr. Genet. 17:13-221 (1990)) are made as described herein. 
The display library is subjected to selection for binding to penicillin V and/or to transition state 
analog given above for the conversion of penicillin V to DAOC V. These binding selections 
may be performed under non-physiological reaction conditions, such as elevated 

15 temperature, to obtain mutants that are active under the new conditions. RSR is applied to 
evolve mutants with 2 - 10* fold improvement in binding affinity for the selecting ligand. 
When the desired level of improved binding has been obtained, candidate mutants are 
expressed in a high throughput format and specific activity for expanding penicillin V to 
DAOC V is quantitatively measured. Recombinants with improved enzymatic activity are 

20 mutagenized and the process repeated to further evolve them. 

Retention of TSA binding by a displayed enzyme (e.g., phage display, lac 
headpiece dimer, polysome display, etc) is a good selection for retention of the overall 
integrity of the active site and hence can be exploited to select for mutants which retain 
activity under conditions of interest Such conditions include but are not limited to: different 

25 pH optima, broader pH optima, activity in altered solvents such as DMSO (Seto et al., pjjA. 
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Sequence 5:131-140 (1995)) orformamide (Chen etal., Proc Natl Acad Sri Ml g A ) 
90:5618-5622, (1993)) altered temperature, improved shelf life, altered or broadened 
substrate specificity, or protease resistance. A further example, the evolution of a p- 
nitrophenyl esterase, using a mammalian display format, is provided below. 
B. Improvement of DNA and RNA Polymerase 

Of particular commercial importance are improved polymerases for use in 
nucleic acid sequencing and polymerase chain reactions. The following properties are 
attractive candidates for improvement of a DNA sequencing polymerase: (1) suppression of 
termination by inosine in labelled primer format (H. Dierick et al., Nucleic Adda r 6? ? 
21:4427-4428 (1993)) (2) more normalized peak heights, especially with fluorescentry 
labelled dideoxy terminators (Parker et al., BioTechniaues 19:116-121 (1995)), (3) better 
sequencing of high GC content DNA (>60% GC) by, for example, tolerating >10% DMSO (D. 
Seto et al., PNA Sequence 5:131-140 (1995); Schekll et al., BioTechnim^ l9(5):691-694 
(1995)), or (4) improved acceptance of novel base analogs such as inosine, 7-deaza dGTP 
(Dierick et al., Nucleic Acids Re;, 21 :4427-4428 (1993)) or other novel base analogs that 
improve the above properties. 

Novel sequencing formats have been described which use matrix assisted 
laser desorption ionization time of flight (MALDT-TOF) mass spectroscopy to resolve dideoxy 
ladders (Smith, UateS. Biotechnology 14:1084-1085 (1996)). It is noted in Smith's recent 
review that fragmentation of the DNA is the singular feature limiting the development of this 
method as a viable alternative to standard gel electrophoresis for DNA sequencing. Base 
analogs which stabilize the N-glycosidic bond by modifications of the purine bases to 7- 
deaza analogs (Wrpekar et al., Rapid Comm in Man Spec 9:525-531 (1995)) or of the 2' 
hydroxyl (such as 2-H or2*-F) "relieve greatly the mass range limitation" of this technique 
(Smith, 1996). Thus, evolved polymerases that can efficiently incorporate these and other 
base analogs conferring resistance to fragmentation under MALDI-TOF conditions are 
valuable innovations. 

Other polymerase properties of interest for improvement by RSR are low 
fidelity thermostable DNA polymerase for more efficient mutagenesis or as a useful correlate 
for acceptance of base analogs for the purposes described above; higher fidelity polymerase 
for PCR (Lundberg et al., Gene 108:1-6 (1991)); higher fidelity reverse transcriptase for 
retroviral gene therapy vehicles to reduce mutation of the therapeutic construct and of the 
retrovirus; improved PCR of GC rich DNA and PCR with modified bases (S. Turner and F. J. 
Jenkins, BjoTechniouas 19(1):48-52 (1995)). 

Thus, in some embodiments of the invention, libraries of mutant polymerase 
genes are screened by direct high throughput screening for improved sequencing properties. 
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The best candidates are then subjected to RSR. Briefly, mutant libraries of candidate 
polymerases such as Taq polymerase are constructed using standard methods such as PCR 
mutagenesis (Caldwell et al., PCRMeth.APB, 2:28-33 (1992)) and/or cassette mutagenesis 
(Sambrook et al.. Molecular Cloning CSH Press (1987)). Incorporation of mutations into Taq 
DNA polymerase such as the active site residue from T7 polymerase that improves 
acceptance of dideoxy nucleotides (Tabor and Richardson, J. Biol. Cham , 265:8322 -8328 
(1990)) and mutations that inactivate the 5" - 3' exonuclease activity (R.S. Rano, 
BioTechnfgMes 18:390-396 (1995)) are incorporated into these libraries. The reassembly 
PCR technique, for example, as described above is especially suitable for this problem. 
Similarly, chimeric polymerase libraries are made by breeding existing thermophilic 
polymerases, sequenase, and £ coffpoll with each other using the bridge oligonucleotide 
methods described above. The libraries are expressed in formats wherein human or robotic 
colony picking is used to replica pick individual colonies into 96 well plates where small 
cultures are grown, and polymerase expression is induced. 

A high throughput, small scale simple purification for polymerase expressed in 
each well is performed. For example, simple single-step purifications of His-tagged Taq 
expressed in £ cotf have been described (Smimov et al., Russian J. BtoBmw rh flm 

21(5):341-342 (1995)), and could readily be adapted for a 96-well expression and purification 
format. 

A high throughput sequencing assay is used to perform sequencing reactions 
with the purified samples. The data is analyzed to identify mutants with improved 
sequencing properties, according to any of these criteria: higher quality ladders on GC-rich 
templates, especially greater than 60% GC, including such points as fewer artifactual 
termination products and stronger signals than given with the wild-type enzyme; less 
termination of reactions by inosine in primer labelled reactions, e.g.. fluorescent labelled 
primers; less variation in incorporation of signals in reactions with fluorescent dideoxy 
nucleotides at any given position; longer sequencing ladders than obtained with the wild-type 
enzyme, such as about 20 to 100 nucleotides; improved acceptance of other known base 
analogs such as 7-deaza purines; 

improved acceptance of new base analogs from combinatorial chemistry libraries (See, for 
example, Hogan, tjaJure. 384(Supp): 17-1 996). 

The best candidates are then subjected to mutagenesis, and then selected or 
screened for the improved sequencing properties decribed above. 

In another embodiment, a screen or selection is performed as follows. The 
replication of a plasmid can be placed under obligate control of a polymerase expressed in £ 
co// or another microorganism. The effectiveness of this system has been demonstrated for 
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making plasmid replication dependent on mammalian polymerase ^ (Sweasy ef fl , 
WJ^iJ^ 90:4626^630, (1993)), Tao P o*merase (SuzuK at a^f^T 
^UU^ 93:9670-9675 (1 996)), or H,V reverse transctfptase (Km e a" 

5 Z^^ 92 ^ 80 ^ ^^^^eraseoeneisp^a 
5 £sm,d beanng a co.Et origin and expressed under the centre, of an arabinose promoter 
The horary ,s enriched for active poiymerases essentia.iy as described by Suzuki et a, * 
(supra), with polymerase expression being induced by the presence of arabinose in the 
culture. 

A further quantitative screen utiBzts the presence of GPP (green fluorescence 
1 0 P~...n) =n ,h. seme plasmid. rep«c p , alina onto „ ^ nt J^ mmmm 

..mature in th. absence of a selecbv. anBtMc. and using , fluonmeter ,o quanttatively 
measure fluorescence of each culture. GFP activity correlates with plasmid stability and 
copy numberwhicb is in turn dependent on expression of active polymerase 

A polymerase with a very high error rate would be a superior sequencing 
•nr/me, as it would have a more norm.li»d ^ for incolpOTtl0 „ „, ^ ^ 
as the cumsnuy used fluorescent* labelled did**., because i, will nave reduce spWfldty 
and selecMty. The urates of canity used polymerases are on the ord.ro, £ 
order, of magnitude lower ttan what c«, be detected given th. resolving power of the gel ' 
systems. An.n.rra.eoHtt.andpossib.yashigh., m.c.uUno.b.d.te.aedbycLnl 
9.I systems, and thus th«, is , targ, window of opportunity to increase the -stopping,, of 
the enzyme. An error-prone cycling polymerase would have other uses such as for 
nypermutagenesis of genes by PCR. 

«... ' n " m ' OT ^™ enta ' me ^' l «^bySuzu«(Su I uki..al..a at 
NalL^flSnmiM 96:9670-9675 (1996)) is u»d to make replication „, a reports 

200 - 300 base. next to th. CoEl origin directly under fn. co«rol o, the expressed 
polymerase (Sweasy and Lo*. Jj^ ,772923-2925 (1995); Sweasy « „., 

or *re, stop codons. Th. congas are grown on arablnos. a, ,h. nonp„miss,b., 
temperature, lowed to recover, and ptM on Mteaivs , actos> minima , ^ 

r^ttt^ 0 '" ,e " PPCOd ° ni,n,,, ' reP<,BSrC,,,e ' , '• •*«™> l »Vmer.s M are 
ZZTh SUnW0,S " "* 11 — *— «— •""•-» oecause their 

35 ^^''^"^^''^"'^^^^"Porter^alph. 

™*» "°^ mm " fleM! *<*» *• '""i™ Objected to RSR. tl»n th. 

polym.resemutantsareretren^.dintothelna«atora ra in. Mutotors can be visual* 
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screened by plating on arabinose/Xgal plates at the nonpermissive temperature. Mutator 
polymerases will give rise to colonies with a high frequency of blue papillae due to reversion 
of the stop codon(s). Candidate papillators can be rescreened by picking a non-papillating 
region of the most heavily papiiiated colonies (i.e. "best" colonies) and replating on the 
arabinose/Xgal indicator medium to further screen for colonies with increased papulation 
rates. These steps are repeated until a desired reversion rate is achieved (e.g., 10" J to 10" 3 
mutations per base pair per replication). 

Colonies which exhibit high frequency papulation are candidates for encoding 
an emir prone polymerase. These candidates are screened for improved sequencing 
properties essentially as for the high throughput screen described above. Briefly, mutant Taq 
proteins are expressed and purified in a 96-well format. The purified proteins are used in 
sequencing reactions and the sequence data are analyzed to identify mutants that exhibit the 
improvements outlined herein. Mutants with improved properties are subjected to RSR and 
rescreened for further improvements in function. 

In some embodiments, GFP containing stop codons instead of lacZ alpha with 
stop codons is used for the construction. Cells with reverted stop codons in GFP are 
selected by fluorescence activated cell sorter (FACS). In general, FACS selection is 
performed by gating the brightest about 0.1- 10%, preferably the top 0.1 to 1%, and collected 
according to a protocol similar to that of Dangl et al., ( Cytometry 2(6):395-401 (1982)). In 
other embodiments, the polA gene is flanked with lox sites or other targets of a site specific 
recombinase. The recombinase is induced, thus allowing one to indudbly delete the polA 
gene (Mulbery et al., Nucleic Acid Res 23:485-490 (1995)). This would allow one to perform 
"Loeb-type" selections at any temperature and in any host For example, one could set up 
such a selection in a recA deficient mesophile or thermophile by placing the polA homologue 
in an indudbly deletabie format and thus apply the selection for active polymerase under 
more general conditions. 

In further embodiments, this general system is preferred for directed in vivo 
mutagenesis of genes. The target gene is cloned into the region near a plasmid origin of 
replication that puts its replication obligatory under control of the error prone polymerase. 
The construct is passaged through a polA(ts) recA strain and grown at the nonpermissive 
temperature, thus specifically mutagenizing the target gene while replicating the rest of the 
plasmid with high fidelity. 

In other embodiments, selection is based on the ability of mutant DNA 
polymerases to PCR amplify DNA under altered conditions or by utilizing base analogs. The 
mutant polymerases act on the template that encodes them in a PCR amplification, thus 
differentially replicating those polymerases. 
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In brief, an initial library of mutants is replica plated. Polymerase preparations 
are done in a 96-well format Crude plasmid preparations are made of the same set Each 
plasmid prep is PCR-ampfified using the polymerase prep derived from that plasmid under 
the conditions for which one wishes to optimize the polymerase (e.g., added DMSO or 
formamide, altered temperature of denaturalion or extension, altered buffer salts, PCR with 
base analogs such a-thiol dNTP's for use with mass spectroscopy sequencing, PCR of GC 
rich DNA (>60% GC),PCR with novel base analogs such as 7-deaza purines, 2' fluoro 
dNTP's, rNTFs, PCR with inosine, etc.). The amplified genes are pooled, cloned.and 
subjected to mutagenesis, and the process repeated until an improvement is achieved. 
C. Evolved Phosphonatase 

Alkaline phosphatase is a widely used reporter enzyme for EUSA assays, 
protein fusion assays, and in a secreted form as a reporter gene for mammalian cells. The 
chemical lability of p-nitrophenyl phosphate (pNPP) substrates and the existence of cellular 
phosphatases that cross-react with pNPP is an important limitation on the sensitivity of 
assays using this reporter gene. A reporter gene with superior signal to noise properties can 
be developed based on hydrorysis of p-nitrophenyl phosphonates, which are far more stable 
to base catalyzed hydrorysis than the corresponding phosphates. Additionally, there are far 
fewer naturally occurring cellular phosphonatases than alkaline phosphatases. Thus a p- 
nitrophenyl phosphonatase is an attractive replacement for alkaline phosphatase because 
the background due to chemical and enzymatic hydrolysis is much lower. This will allow one 
to make EUSA's more sensitive for detecting very small concentrations of antigen. 

Chen et al. fJ. Mol. Biol. 234:165-178 (1993)) have shown that a Staph, 
aureus beta-lactamase can hydroryze p-nitrophenyl phosphorate esters with single turnover 
kinetics. The active site Ser70 (the active site nudeophile for beta lactam hydrolysis) forms 
a covalent intermediate with the substrate. This is analogous to the first step in hydrolysis of 
beta lactams, and this enzyme can be evolved by RSR to hydroryze phosphonates by a 
mechanism analogous to beta lactam hydrolysis. Metcalf and Wanner have described a 
cryptic phosphonate utilizing operon (phn) in £ coli, and have constructed strains bearing 
deletions of the phn operon U. Bact 175:3430-3442 (1 993)). This paper discloses 
selections for growth of £ coli on phosphate free minimal media where the phosphorous is 
derived from hydrolysis of alkyl phosphonates by genes in the phn operon. Thus, one could 
select for evolved p-nitrophenyl phosphonatases that are active using biochemical selections 
on defined minimal media. Specifically, an efficient phosphonatase is evolved as follows. A 
library of mutants of the Staph, aureus beta lactamase or of one of the £ coli phn enzymes 
is constructed. The library is transformed into £ coff mutants wherein the phn operon has 
been deleted, and selected for growth on phosphate free MOPS minimal media containing p- 
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nitrophenyl phosphonate. RSR is applied to selected mutants to further evolve the enzyme 
for improved hydrolysis of p-nitrophenyl phosphorates. 
D. Evolved Detergent Pr^»« 

Proteases and lipases are added in large quantities to detergents to 
enzymatically degrade protein and lipid stains on clothes. The incorporation of these 
enzymes into detergents has significantly reduced the need for surfactants in detergents with 
a consequent reduction in the cost of formulation of detergents and improvement in stain 
removal properties. Proteases with improved specific activity, improved range of protein 
substrate specificity, improved shelf life, improved stability at elevated temperature, and 
reduced requirements for surfactants would add value to these products. 

As an example, subtilisin can be evolved as follows. The cloned subtilisin 
gene (von der Osten et al., J^BifllfisamflL 28:55*8 (1993)) can be subjected to RSR using 
growth selections on complex protein media by virtue of secreted subtilisin degrading the 
complex protein mixture. More specifically, libraries of subtilisin mutants are constructed in 
an expression vector which directs the mutant protein to be secreted by Bacillus subWus. 
Bacillus hosts transformed with the libraries are grown in minimal media with complex protein 
formulation as carbon and/or nitrogen source. Subtilisin genes are recovered from fast 
growers and subjected to RSR, then screened for improvement in a desired property. 
E- Escape of Phaoe from a "Pm^| n Nft y. 

In some embodiments, selection for improved proteases is performed as 
follows. A library of mutant protease genes is constructed on a display phage and the phage 
grown in a multiwell format or on plates. The phage are overlayed with a "protein net" which 
ensnares the phage. The net can consist of a protein or proteins engineered with surface 
disulphides and then crosslinked with a library of peptide linkers. A further embodiment 
employs an auxiliary matrix to further trap the phage. The phage are further incubated, then 
washed to collect liberated phage wherein the displayed protease was able to liberate the 
phage from the protein net. The protease genes are then subjected to RSR for further 
evolution. A further embodiment employs a library of proteases encoded by but not 
displayed on a phagemid wherein streptavidin is fused to pill by a peptide linker. The library 
of protease mutants is evolved to cleave the linker by selecting phagemids on a biotin 
column between rounds of amplification. 

In a further embodiment, the protease is not necessarily provided in a display 
format. The host cells secrete the protease encoded by but not surface diplayed by a 
phagemid, while constrained to a well, for example, in a microliter plate. Phage display 
format is preferred where an entire high titre lysate is encased in a protein net matrix, and 
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the phage expressing active and broad specificity proteases digesting the matrix to be 
liberated for the next round of amplification, mutagenesis, and selection. 

. In a further embodiment, the phage are not constrained to a well but, rather, 
protein binding filters are used to make a colony of plaque lifts and are screened for activity 
with chromogenic or fluorogenic substrates. Colonies or plaques corresponding to positive 
spots on the filters are picked and the encoded protease genes are recovered by, for 
example, PCR. The protease genes are then subjected to RSR for further evolution. 

F. Sereans for Im proved Protease Activity 

Peptide substrates containing fluorophores attached to the carboxy terminus 
and fluorescence quenching moities on the amino terminus, such as those described by 
Holskin, et al, rAnal Biochem . 227:148-55 (1995)) (e.g., (4-4'- 
dimetttylamirtophenazo)benzcyl-arg^ 

amino-naphthalene-1 -sulfonic acid) are used to screen protease mutants for broadened or 
altered specificity. In brief, a library of peptide substrates is designed with a flourophore on 
the amino terminus and a potent fluorescence quencher on the carboxy terminus, or vice 
versa. Supematants containing secreted proteases are incubated either separately with 
various members of the library or with a complex cocktail. Those proteases which are highly 
active and have broad specificity will cleave the majority of the peptides, thus releasing the 
fluorophore from the quencher and giving a positive signal on a fluorimeter. This technique 
is amenable to a high density multiwell format 

G. Improving ph armaceutical proteins using RSR 

Table I lists proteins that are of particular commercial interest to the 
pharmaceutical industry. These proteins are all candidates for RSR evolution to improve 
function, such as specific activity, ligand binding, shelf life, reduction of side effects through 
enhanced specificity, eta All are well-suited to manipulation by the techniques of the 
invention. Additional embodiments especially appficable to this list are described below. 

First, high throughput methods for expressing and purifying libraries of mutant 
proteins, similar to the methods described above for Taq polymerase, are applied to the 
proteins of Table I. These mutants are screened for activity in a functional assay. For 
example, mutants of IL2 are screened for resistance to degradation by plasma or tissue 
proteases; or for retention of activity on the low affinity IL2 receptor but with loss of activity 
on the high affinity IL2 receptor. The genes from mutants with improved activity relative to 
wild-type are recovered, and subjected to RSR to improve the phenotype further. 

Preferably, the libraries are generated in a display format such that the mature 
folded protein is physically linked to the genetic information that encodes it Examples 
include phage display using filamentous phage (CNeil et al., Current Biology 5:443-449 
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(1995)) or bacteriophage lambda gene V display (Dunn, J. Mol. Biol. 248:497-506 (1995)), 
peptides on plasmids (Gates et al. t J. Mol Biol 255:373-386 (1995)) where the polypeptide 
of interest is fused to a lac headpiece dimer and the nascent translation product binds to a 
lac operator site encoded on the plasmid or PCR product, and polysome display (Mattheakis 
«* al - PrPC. Natl. Acqj, Scj, (U.S.A.) 91:9022-9026 (1994)) where ribosomes are stalled on 
mRNA molecules such that the nascent polypeptide is exposed for interaction with cognate 
ligands without disrupting the stalled ribosome/mRNA complex. Selected complexes are 
subjected to RT-PCR to recover the genes. 

When so displayed, affinity binding of the recombinant phage is often done 
using a receptor for the protein of interest. In some cases it is impractical to obtain purified 
receptor with retention of all desired biological characteristics (for example, 7- 
transmembrane (7-TM) receptors). In such cases, one could use cells expressing the 
receptor as the panning substrate. For example, Barry et al. (Nat. Med. 2:299-305 (1996)) 
have described successful panning of M13 libraries against whole cells to obtain phage that 
bind to the cells expressing a receptor of interest. This format could be generally applied to 
any of the proteins listed in Table I. 

In some embodiments, the following method can be used for selection. A 
stock of phagemids encoding IFN alpha mutants, for example, can be used directly at 
suitable dilution to stimulate cells. The biological effect on the cells can be read out by 
standard assays (e.g.. proliferation or viral resistance) or indirectly through the activation of a 
reporter gene such as GFP (Crameri et al., JUaLMegl 14:315-319 (1996)) under the control 
of an IFN responsive promoter, such as an MHC class I promoter. In one embodiment, 
phagemids remaining attached after stimulation, expression and FACS purification of the 
responsive cells, can be purified by FACS. Preferably, the brightest cells are collected. The 
phagemids are collected and their DNA subjected to RSR until the level of desired 
improvement is achieved. 

Thus, for example, IL-3 is prepared in one of these display formats and 
subjected to RSR to evolve an agonist with a desired level of activity. A library of IL3 
mutants on a filamentous phage vector is created and affinity selected ("panned") against 
purified IL3 receptor to obtain mutants with improved affinity for receptor or for improved 
potency of phase displayed IL-3. The mutant IL-3 genes are recovered by PCR. subjected to 
RSR, and recloned into the display vector. The cycle is repeated until the desired affinity or 
agonist activity is achieved. 

Many proteins of interest are expressed as dimers or higher order multimeric 
forms. In some embodiments, the display formats desdbed above preferentially are applied 
to a single chain version of the protein. Mutagenesis, such as RSR, can be used in these 
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display formats to evolve Improved single chain derivatives of multimeric factors which 
initially have low but detectable activity. This strategy is described in more detail below. 
H. y/h«lft Can Selections 

In some embodiments, the eukaryotie cell is the unit of biological selection. 
The following general protocol can be used to apply RSR to the improvement of proteins 
using eukaryotie cells as the unit of selection: (1) transfection or transduction of libraries of 
mutants into a suitable host cell, (2) expression of the encoded gene product(s) either 
transiently or stably, (3) functional selection for cells with an improved phenotype (expression 
of a receptor with improved affinity for a target ligand; viral resistance, etc., (4) recovery of 
the mutant genes by, for example, PCR followed by preparation of HIRT supematants with 
subsequent tranformation of £ co//, (5) RSR and (6) repetition of steps (1) - (5) until the 
desired degree of improvement is achieved. 

For example, previous work has shown that one can use mammalian surface 
display to functionally select cells expressing cloned genes, such as using an antibody to 
done the gene for an expressed surface protein (Reviewed by Seed, Curr, Qpin. BiPteChnQ'. 
6:567-573 (1995)). Briefly, cells are transiently transfected with libraries of doned genes 
residing on replicating episomal vectors. An antibody directed against the protein of interest 
(whose gene one wishes to done) is immobilized on a solid surface such as a plastic dish, 
and the transfeded cells expressing the protein of interest are affinity seleded. 

For example, the affinity of an antibody for a ligand can be improved using 
mammalian surface display and RSR. Antibodies with higher affinity for their cognate ligands 
are then screened for improvement of one or more of the following properties: (1) improved 
therapeutic properties (increased cell killing, neutralization of ligands, activation of signal 
transduction pathways by crosslinking receptors). (2) improved in vivo imaging applications 
(detection of the antibody by covalent/noncovalent binding of a radionucfide or any agent 
detectable outside of the body by noninvasive means, such as NMR), (3) improved analytical 
applications (EUSA detection of proteins or small molecules), and (4) Improved catalysts 
(catalytic antibodies). The methods described are general and can be extended to any 
receptor-ligand pair of interest A specific example is provided in the experimental sedion. 

The use of a one mutant sequence-one transfeded ceil protocol is a preferred 
design feature for RSR based protocols because the point is to use functional selection to 
identify mutants with improved phenotypes and, if the transfection is not done in a "donal" 
fashion, the functional phenotype of any given cell is the result of the sum of multiple 
transfeded sequences. Protoplast fusion is one method to achieve this end, since each 
protoplast contains typically greater than 50 copies each of a single plasmid variant 
However, it is a relatively low effidency process (about 10 3 - 10* transfedants), and it does 
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not work well on some non-adherent cell lines such as B cell lines. Retroviral vectors 
provide a second alternative, but they are limited in the size of acceptable insert (<10 kb) and 
consistent, high expression levels are sometimes difficult to achieve. Random integration 
results in varying expression levels, thus introducing noise and limiting one's ability to 
5 distinguish between improvements in the affinity of the mutant protein vs. increased 

expression. A related class of strategies that can be used effectively to achieve "one gene- 
one cell" DNA transfer and consistent expression levels for RSR is to use a viral vector 
which contains a lox site and to introduce this into a host that expresses ere recombinase, 
preferably transiently, and contains one or more lox sites integrated into its genome, thus 
10 limiting the variability of integration sites (Rohlman et al. Nature Biotech. 14:1562-1565 
(1996)). 

An alternative strategy is to transfect with limiting concentrations of plasmid 
(i.e., about one copy per cell) using a vector that can replicate in the target cells, such as is 
the case with plasmids bearing SV40 origins transfected into COS cells. This strategy 

1 5 requires that either the host cell or the vector supply a replication factor such as SV40 large 
T antigen. Northrup et al. U. Biol. Chem. 268:2917-2923 (1993)) describe a strategy wherein 
a stable transfectant expressing SV40 large T antigen is then transfected with vectors 
bearing SV40 origins. This format gave consistently higher transient expression and 
demonstrable plasmid replication, as assayed by sensitivity to digestion by Dpn I. Transient 

20 expression (i.e. non-integrating plasmids) is a preferred format for cellular display selections 
because it reduces the cycle time and increases the number of mutants that can be 
screened. 

The expression of SV40 large T antigen or other replication factors may have 
deleterious effects on or may work inefficiently in some cells. In such cases, RSR is applied 
25 to the replication factor itself to evolve mutants with improved activity in the ceil type of 
interest. A generic protocol for evolving such a factor is as follows: 

The target cell is transfected with GFP cloned onto a vector containing SV40 
large T antigen, an SV40 origin, and a reporter gene such as GFP; a related format is 
cotransfection with limiting amounts of the SV40 large T antigen expression vector and an 
30 excess of a reporter such as GFP cloned onto an SV40 origin containing plasmid. Typically 
after 1-10 days of transient expression, the brightest cells are purified by FACS. SV40 large 
T antigen mutants are recovered by PCR, and subjected to mutagenesis. The cycle is 
repeated until the desired level of improvement is obtained. 
I. Autocrine Selection 
35 In some embodiments, mutant proteins are selected or screened based on 

their ability to exert a biological effect in an autocrine fashion on the cell expressing the 
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mutant protein. For example, a libraiy of alpha interferon genes can be selected for 
induction of more potent or more specific antiviral activity as follows. A library of interferon 
alpha mutants is generated in a vector which allows for induction of expression (i.e. under 
control of a metallothionein promoter) and efficient secretion in a multiwell format (96-well for 

5 example) with one or a few independent clones per well. In some embodiments, the 
promoter is not inducible.and may be constitutive. 

Expression of the cloned interferon genes is induced. The cells are 
challenged with a cytotoxic vims against which one wishes to evolve an optimized interferon 
(for example vesicular stomatitus virus or HIV). Surviving cells are recovered. The cloned 

10 interferon genes are recovered by PCR ampBfication, subjected to RSR, and doned back 
into the transection vector and retransfected into the host cells. These steps are repeated 
until the desired level of antiviral activity is evolved. 

In some embodiments, the virus of interest is not strongly cytotoxic. In this 
case a conditionally lethal gene, such as herpes simplex virus thymidine kinase, is doned 

1 5 into the virus and after challenge with virus and recovery, conditionally lethal selective 

conditions are applied to kill cells that are infected with virus. An example of a conditionally 
lethal gene is herpes TK, which becomes lethal upon treating cells expressing this gene with 
the thymidine analog acydovir. In some embodiments, the antiproliferative activity of the 
cloned interferons is selected by treating the cells with agents that kill dividing cells (for 

20 example. DNA alkylating agents). 

In some embodiments, potent cytokines are seleded by expressing and 
secreting a library of cytokines in cells that have GFP or another reporter under control of a 
promoter that is induced by the cytokine, such as the MHC dass I promoter being induced by 
evolved variants of alpha interferon. The signal transduction pathway is configured such that 
25 the wild type cytokine to be evolved gives a weak but detectable signal. 

J. impravftd Serum stability and Circulation Half-Ufe 

In some embodiments of the invention, proteins are evolved by RSR to have 
improved circulation half fife or stability in serum. A preferred method for improving half-life 
is evolving the affinity of a protein of interest for a long lived serum protein, such as an 
30 antibody or other abundant serum protein. Examples of how affinity for an antibody can 
enhance serum half fife indude the co-administration of IL2 and anti-IL2 antibodies which 
increases serum half-life and anti-tumor activity of human recombinant IL2 (Courtney et al., 
Immtinooharmacoloav 28:223-232 (1994)). 

The eight most abundant human serum proteins are serum albumin, 
35 immunoglobulins, lipoproteins, haptoglobin, fibrinogen, transferrin, alpha-1 antitrypsin, and 
alpha-2 macroglobulin (Doolittle, rhfl P tar6 The Plasma Proteins F. Putnam, ed.; Academic 
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Press, 1984). These and other abundant serum proteins such as ceruloplasmin and 
fibronectin are the primary targets against which to evolve binding sites on therapeutic 
proteins such as in Table I for the purpose of extending half-life. In the case of antibodies, 
the preferred strategy is to evolve affinity for constant regions rather than variable regions in 

5 order to minimize individual variation in the concentration of the relevant target epitope 
(antibody V region usage between different individuals is significantly variable). 

Binding sites of the desired affinity are evolved by applying phage display, 
peptides on plasmid display or polysome display selections to the protein of interest. As a 
source of diversity, one could randomly mutagenize an existing binding site or otherwise 

10 defined region of the target protein, append a peptide library to the N terminus, C terminus, 
or internally as a functionally nondisruptive loop, or use "family shuffling" of homologous 
genes. DNA shuffling is particularly advantageous for problems where one wants to 
simultaneously optimize two or more "uncorrelated" properties such as improved affinity for 
HSA while retaining biological activity. 

1 5 in other embodiments of the invention, half life is improved by derivatization 

with PEG, other polymer conjugates or half-life extending chemical moieties. These are 
established methods for extending half-fife of therapeutic proteins (R. Duncan, £lin 
Pharmacokinet 27:290-306 (1994); Smith et aL, BEECH 1 1 397-403 (1993)) and can have 
the added benefit of reducing immunogenic^ (R. Duncan, Clin. PharmawKinet 27:290-306 

20 (1994)). However, derivatization can also result in reduced affinity of the therapeutic protein 
for its receptor or ligand. RSR is used to discover alternative sites in the primary sequence 
that can be substituted with lysine or other appropriate residues for chemical or enzymatic 
conjugation with half-life extending chemical moieties, and which result in proteins with 
maximal retention of biological activity. 

25 A preferred strategy is to express a library of mutants of the protein in a 

display format, derivatize the library with the agent of interest (i.e. PEG) using chemistry that 
does not biologically inactivate the display system, select based on affinity for the cognate 
receptor, PCR amplify the genes encoding the selected mutants, shuffle, reassemble, 
redone into the display format, and iterate until a mutant with the desired activity, post 

30 modification, is obtained. An alternative format is to express, purify and derivatize the 

mutants in a high throughput format, screen for mutants with optimized activity, recover the 
corresponding genes, subject the genes to RSR and repeat. 

In further embodiments of the invention, binding sites for target human 
proteins that are localized in particular tissues of interest are evolved by RSR. For example, 

35 an interferon can be evolved to contain a binding site for a liver surface protein, such as 
hepatocyte growth factor receptor, such that the interferon partitions selectively onto liver 
cells and has higher specific antiviral activity on liver cells. Such an evolved interferon could 
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be useful for treatment of hepatitis. Analogously, one could evolve affinity for abundant 
epitopes on erythrocytes such as ABO blood antigens to localize a given protein to the blood 
stream. 

In further embodiments of the invention, the protein of interest is evolved to 
5 have increased stability to proteases. For example, the clinical use of IL2 is limited by 
serious side effects that are related to the need to administer high doses. High doses are 
required due to the short half life (3-5 min, Lotze et al.. JAMA 256{22):31 17-3124 (1986)) and 
the consequent need for high doses to maintain a therapeutic level of IL2. One of the factors 
contributing to short half-lives of therapeutic proteins is proteolysis by serum proteases. 
10 Cathepsin D, a major renal acid protease, is responsible for the degradation of IL2 in Balb/c 
mice (Ohnishi et al., Cancer Res. 50:1 107-1 1 12 (1990)). Furthermore, Ohnishi showed that 
treatment of Balb/c mice with pepstatin, a potent inhibitor of this protease, prolongs the half 
life of recombinant human IL2 and augments tymphokme-adivated killer cell activity in this 
mouse model. 

1 5 Thus, evolution of protease resistant variants of IL2 or any of the proteins 

listed in Table I that are resistant to serum or kidney proteases is a preferred strategy for 
obtaining variants with extended serum half Ives. 

A preferred protocol is as follows. A library of the mutagenized protein of 
interest is expressed in a display system with a gene-distal epitope tag (i.e. on the 

20 N-terminus of a phage display construct such that if it is cleaved off by proteases, the 
epitope tag is lost). The expressed proteins are treated with defined proteases or with 
complex cocktails such as whole human serum. Affinity selection with an antibody to the 
gene distal tag is performed. A second screen or selection demanding biological function 
(e.g., binding to cognate receptor) is performed. Phage retaining the epitope tag (and hence 

25 protease resistant) are recovered and subjected to RSR. The process is repeated until the 
desired level of resistance is attained. 

In other embodiments, the procedure is performed in a screening format 
wherein mutant proteins are expressed and purified in a high throughput format and 
screened for protease resistance with retention of biological activity. 

30 In further embodiments of the invention, the protein of interest is evolved to 

have increased shelf life. A library of the mutagenized nucleic acid squence encoding the 
protein of interest is expressed in a display format or high throughput expression format, and 
exposed for various lengths of time to conditions for which one wants to evolve stability 
(heat, metal ions, nonphysiotogical pH of, for example, <6 or >8, lyophilization, freeze- 

35 thawing). Genes are recovered from functional survivors, for example, by PCR. The ONA is 
subjected to mutagenesis, such as RSR, and the process repeated until the desired level of 
improvement is achieved. 
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The case of IFN presents an opportunity to evolve recombinants with 
improved half-life. There are > 10* possible recombinations of the amino acid diversity in 
this family. Since these recombinants are formed from segments of wild-type IFN genes, 
relatively few if any novel T cell epitopes will be created by the process. Molecules that are 
highly active are like to closely resemble natural interferons structurally, and thus present few 
if any novel B cell epitopes. This creates a situation wherein the ability to create large 
libraries of recombinants can be combined with the power of phage panning to select for 
recombinants with affinity for abundant serum proteins such as human serum albumin. 
Proteins with affinity for long lived, abundant serum proteins have been shown to have 
enhanced serum half lives. Thus, one could obtain IFN recombinants with lengthened serum 
half lives by using phage panning to select for recombinants which have affinity for proteins 
such as HSA. Since binding to HSA or the mutations which create affinity for HSA may 
abrogate or substantially reduce IFN activity, one would have to counter screen for retention 
of potent IFN activity. By applying phage panning, activity screening, and shuffling iteratrveiy, 
one could obtain recombinants with high activity and a desired level of affinity for target 
serum proteins. The half lives of candidate IFN's can be tested in transgenic mice 
expressing the human serum protein as a neo-self protein. 

These approaches can be generalized to other proteins for which there exist 
multiple homologous human allelic or nonallelic forms. The approach can also be 
generalized further to be applied to proteins with no non-allelic human homologs, such as 
IL2. The gene for IL2 would be shuffled with IL2 genes from other mammals, with a 
preference for closely related mammals such as the primates. Recombination of the "natural 
diversity" defined by these homologs is expected to generate very high quality libraries with 
many active and superior molecules as was seen fotthe activities of the shuffled interferons 
in human and mouse cells. 

K. halved Singl e Chain Versions of Mutttsubunit Factors 

As discussed above, in some embodiments of the invention, the substrate for 
evolution by RSR is preferably a single chain contraction. The possibility of performing 
asymetric mutagenesis on constructs of homomultimeric proteins provides important new 
pathways for further evolution of such constructs that is not open to the proteins in their 
natural homomultimeric states. In particular, a given mutation in a homomuKimer will result 
in that change being present in each identical subunit. In single chain constructs, however, 
the domains can mutate independently of each other. 

Conversion of multisubunit proteins to single chain constructs with new and 
useful properties has been demonstrated for a number of proteins. Most notably, antibody 
heavy and light chain variable domains have been finked into single chain Fv's (Bird et a!., 
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Science 242:423-426 (1988)). and this strategy has resulted in antibodies with improved 
thermal stability (Young et al.. EEBS-Lstt 377:135-139 (1995)). or sensitivity to proteolysis 
(Solar et al., Prot. Eno. 8:717-723 (1995)). A functional single chain version of IL5, a 
homodimer, has been constructed, shown to have affinity for the IL5 receptor similar to that 
5 of wild type protein, and this construct has been used to perform assymetric mutagenesis of 
the dimer (Li et al.. ■» Biol Cham. 271:1817-1820 (1996)). A single chain version of 
urokinase-type plasminogen activator has been made, and it has been shown that the single 
chain construct is more resistant to plasminogen activator inhibitor type 1 than the native 
homodimer (Higazi et al., Blood 87:3545-3549 (1 996)). Finally, a single-chain insulin-like 
10 growth factor l/insulin hybrid has been constructed and shown to have higher affinity for 

chimeric insulin/lGF-1 receptors than that of either natural figand (Kristensen et al., Bjpchem, 
i 305:981-986 (1995)). 

In general, a linker is constructed which joins the amino terminus of one 
subunit of a protein of interest to the carboxyl terminus of another subunit in the complex. 
1 5 These fusion proteins can consist of linked versions of homodimers, homomultimers, 

heterodimers or higher order heteromultimers. In the simplest case, one adds polypeptide 
linkers between the native termini to be joined. Two significant variations can be made. 
First, one can construct diverse libraries of variations of the wild type sequence in and 
around the junctions and in the linkers to facilitate the construction of active fusion proteins. 
20 Secondly, Zhang et al.. /Biochemistry 32:1231 1-12318 (1993)) have described circular 
permutations of T4 lysoryme in which the native amino and carboxyl termini have been 
joined and novel amino and carboxyl termini have been engineered into the protein. The 
methods of circular permutation, libraries of Tinkers, and libraries of junctional sequences 
flanking the linkers allow one to construct libraries that are diverse in topological linkage 
25 strategies and in primary sequence. These libraries are expressed and selected for activity. 
Any of the above mentioned strategies for screening or selection can be used, with phage 
display being preferable in most cases. Genes encoding active fusion proteins are 
recovered, mutagenized, reselected, and subjected to standard RSR protocols to optimize 
their function. Preferably, a population of selected mutant single chain constructs is PCR 
30 amplified in two septate PCR reactions such that each of the two domains is amplified 

separately. Oligonucleotides are derived from the 5' and 3' ends of the gene and from both 
strands of the linker. The separately amplified domains are shuffled in separate reactions, 
then the two populations are reeombined using PCR reassembly to generate intact single 
chain constructs for further rounds of selection and evolution. 
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V. Improved Pr ppprtjBs of Pharmaceutical Proteins 

A. F»ahied Soec lfieitv for Recentor or Call Type cf Interest 

The majority of the proteins listed in Table I are either receptors or ligands of 
pharmaceutical interest Many agonists such as chemokines or interleukins agonize more 
than one receptor. Evolved mutants with improved specificity may have reduced side effects 
due to their loss of activity on receptors which are implicated in a particular side effect profile. 
For most of these ligand/receptors. mutant forms with improved affinity would have improved 
pharmaceutical properties. For example, an antagonistic form of RANTES with improved 
affinity for CCR5 or CXCR4 or both should be an improved inhibitor of HIV infection by virtue 
of achieving greater receptor occupancy for a given dose of the drug. Using the selections 
and screens outlined above in combination with RSR, the affinities and specificities of any of 
the proteins listed in Table I can be improved. For example, the mammalian display format 
could be used to evolve TNF receptors with improved affinity for TNF. 

Other examples include evolved interferon alpha variants that arrest tumor 
cell proliferation but do not stimulate NK cells, IL2 variants that stimulate the low affinity IL2 
receptor complex but not the high affinity receptor (or vice versa), superantigens that 
stimulate only a subset of the V beta proteins recognized by the wild type protein (preferably 
a single V beta), antagonistic forms of chemokines that specifically antagonize only a 
receptor of interest, antibodies with reduced cross-reactivity, and chimeric factors that 
specifically activate a particular receptor complex. As an example of this latter case, one 
could make chimeras between IL2 and IL4, 7, 9, or 15 that also can bind the IL2 receptor 
alpha, beta and gamma chains (Theze et al., Imm. Today 17:481-486 (1996)), and select for 
chimeras that retain binding for the intermediate affinity IL2 receptor complex on monocytes 
but have reduced affinity for the high affinity IL2 alpha, beta, gamma receptor complex on 
activated T ceils. 

B. Evolved Aooni st* with Increased Potency 

In some embodiments of the invention, a preferred strategy is the selection or 
screening for mutants with increased agonist activity using the whole cell formats described 
above, combined with RSR. For example, a library of mutants of IL3 is expressed in active 
form on phage or phagemids as described by Gram et al. (J, lmm u n. Meth. 161:169-176 
(1993)). Clonal lysates resulting from infection with plaque-purified phage are prepared in a 
high through-put format such as a 96-well microtiter format. An IL3-dependent cell line 
expressing a reporter gene such as GFP is stimulated with the phage stocks in a high 
throughput 96^well. Phage that result in positive signals at the greatest dilution of phage 
supematants are recovered; alternatively, DNA encoding the mutant IL3 can be recovered by 
PCR. In some embodiments, single cells expressing GFP under control of an IL3 responsive 
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promoter can be stimulated with the IL3 phagemid library, and the positive cells are FACS 
sorted. The recovered nudeic acid is then subjected to PCR, and the process repeated until 
the desired level of improvement is obtained: 

Table I 



POLYPEPTIDE CANDIDATES FOR EVOLUTION 



Na me 



Alpha- 1 antitrypsin 
Angiostatin 
Antihemotytic factor 
10 Apolipoprotein 
Apoprotein 

Atrial natriuretic factor 

Atrial natriuretic polypeptide 

Atrial peptides 
1 5 Bacillus thuringensis toxins (Bt toxins) 

C chemikines (i.e., Lymphotactin) 

C-X-C chemokines (e.g., T39765, NAP-2, 
ENA-78, Gro-a, Gro-b, Gro-c, IP- 
10, GCP-2, NAP-4, SDF-1. PF4. 
20 MIG) 

Calcitonin 

CC chemokines (e.g.. Monocyte 
chemoattractant protein-1. 
Monocyte chemoattractant protein- 

25 2, Monocyte chemoattractant 

protein-3, Monocyte inflammatory 
protein-1 alpha, Monocyte 
inflammatory protein-1 beta, 
RANTES, I309, R83915, R91733, 

30 HCC1 , T58847, D31065, T64262) 

CD40 ligand 

Ciliary neurotrophic factor (CNTF) 
Collagen 

Colony stimulating factor (CSF, G-CSF, 
35 GM-CSF, M-CSF) 

Complement factor 5a 

Complement inhibitor 

Complement receptor 1 

Epidermal growth factor (EGF) 
40 Erythropoietin 

Factor IX 

Factor VII 

Factor VIII 

Factor X 
45 Fibrinogen 

Fibronectin 

FLT-3 receptor antagonist 
Glucocerebrosidase 
Gonadotropin 
50 Growth hormone 



Name 

Hedgehog proteins (e.g., Sonic, Indian, 
Desert) 

Hemoglobin (for blood substitute; for 
radiosensltization) 

Hirudin 

Human serum albumin 
Insulin 

Interferon gamma 

Interteukin 20 (melanoma differentiation 

associated gene 7) 
Interieukins (1 to 18) 
Lactoferrin 
Leptin 

Leukemia inhibitory factor (LIF) 

Ludferase 

Neurturin 

Neutrophil inhibitory factor (NIF) 
Oncostatin-M 
Osteogenic protein 
Parathyroid hormone 
Protein A 
Protein G 

RANK (receptor activator of NF-kB) 

RANK ligand 

Relaxin 

Renin 

Salmon calcitonin 

Salmon growth hormone 

Soluble CD4 

Soluble CD28 

Soluble CD40 

Soluble CD40 ligand 

Soluble CD80(B7-1) 

Soluble CD86 (B7-2) 

Soluble CD150 (SLAM) 

Soluble CD152 (CTLA-4) 

Soluble complement receptor I 

Soluble l-CAM 1 

Soluble INF gamma receptor 

Soluble interteukin receptors (IL-1, 2, 3, 4, 

5, 6, 7, 9, 10,11,12. 13. 14, 15, 

16, 17. 18, 20) 
Soluble leptin receptor 
Soluble RANK 
Soluble TNF receptor 
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Somatomedin 
Somatostatin 
Somatotropin 
Stem cell factor 
Streptokinase 

Superantigens, i.e., Staphylococcal 

enterotowns (SEA, SEB, SEC1, 
SEC2, SEC3, SED. SEE), Toxic 
shock syndrome toxin (TSST-1), 
Exfoliating toxins A and B. 
Pyrogenic exotoxins A, B, and C, 
and M. arthritidis mitogen 

C. Evolution of Components of Eukary otic Signal Transduction or 
Transcriptional Pathways 

Using the screens and selections listed above, RSR can be used in several 
ways to modify eukaryotic signal transduction or transcriptional pathways. Any component of 
a signal transduction pathway of interest, or of the regulatory regions and transcriptional 
activators that interact with this region and with chemicals that induce transcription can be 
evolved. This generates regulatory systems in which transcription is activated more potently 
by the natural inducer or by analogues of the normal inducer. This technology is preferred 
for the development and optimization of diverse assays of biotechnological interest. For 
example, dozens of 7 transmembrane receptors (7-TM) are validated targets for drug 
discovery (see, for example, Siderovski et al., QuelM*. 6(2):21 1-212 (1996); An et al.. 
FEBS Lett.. 375(1-2):121-124 (1995); Raport et al., Gene. 163(2):295-299 (1995); Song et 
al., G^rjsmiEL 28(2):347-349 (1995); Strader et al. FjA£EJLL 9(9):745-754 (1995); Benka et 
al.. FEBS Lett.. 363(1-2):49-52 (1995); Spiegel, .1 Clin Endocrinol. Metab.. 81(7):2434-2442 
(1996); Post et al., FASEB J.. 10(7):741-749 (1996); Reisine et al., Ann NY Acad. Sg„ 
780:168-175 (1996); Spiegel, Anmi Raf Phvsiol.. 58:143-170 (1996); Barak et al., 
Biochemistry. 34(47):1 5407-1 5414 (1995); and Shenker. Sailljeres Clin, EfKlocfinol. Metab.. 
9(3):427-451 (1995)). The development of sensitive high throughput assays for agonists and 
antagonists of these receptors is essential for exploiting the full potential of combinatorial 
chemistry in discovering such ligands. Additionally, biodetectors or biosensors for different 
chemicals can be developed by evolving 7-TMrs to respond agonistfcally to novel chemicals 
or proteins of Interest. In this case, selection would be for contructs that are activated by the 
new chemical or polypeptide to be detected. Screening could be done simply with 
fluorescence or fight activated cell sorting, since the desired Improvement is coupled to fight 
production. 

In addition to detection of small molecules such as pharmaceutical drugs and 
environmental pollutants, biosensors can be developed that will respond to any chemical for 



Superoxide dismutase 

Thrornbopoietin 

Thymosin alpha 1 

Tissue plasminogen activator 

Transforming growth factor beta 

Tumor necrosis factor beta (TNF beta) 

Tumor necrosis factor receptor (TNFR) 

Tumor necrosis factor-alpha (TNF alpha) 

Urokinase 

Viral IL10 homologs 
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which there are receptors, or for which receptors can be evolved by recursive sequence 
recombination, such as hormones, growth factors, metals and drugs. The receptors may be 
intracellular and direct activators of transcription, or they may be membrane bound receptors 
that activate transcription of the signal indirectly, for example by a phosphorylation cascade. 
They may also not act on transcription at all, but may produce a signal by some post- 
transcriptional modification of a component of the signal generating pathway. These 
receptors may also be generated by fusing domains responsible for binding different ligands 
with different signalling domains. Again, recursive sequence recombination can be used to 
increase the amplitude of the signal generated to optimize expression and functioning of 
chimeric receptors, and to alter the specificity of the chemicals detected by the receptor. 

For example, G proteins can be evolved to efficiently couple mammalian 7-TM 
receptors to yeast signal transduction pathways. There are 23 presently known G alpha 
protein loci in mammals which can be grouped by sequence and functional similarity into four 
groups, Gs (Gna, Gna1), Gi (Gnai-2. Gnai-3, Gnai-1. Gnao, Gnat-1, Gnat-2, Gnaz), Gq 
(Gnaq, Gna-11, Gna-14, Gna-15) and G12 (Gna-12, Gna-13) (B. Numberg et al.. JJMaL 
Med . 73:123-132 (1995)). They possess an endogenous GTP-ase activity allowing 
reversible functional coupling between ligand-bound receptors and downstream effectors 
such as enzymes and ion channels. G alpha proteins are complexed noncovalentiy with G 
beta and G gamma proteins as well as to their cognate 7-TM receptors). Receptor and 
signal specificity are controlled by the particular combination of G alpha, G beta (of which 
there are five known loci) and G gamma (seven known loci) subunits. Activation of the 
heterotrimeric complex by ligand bound receptor results in dissociation of the complex into G 
alpha monomers and G beta, gamma dimers which then transmit signals by associating with 
downstream effector proteins. The G alpha subunit is believed to be the subunH that 
contacts the 7-TM, and thus it is a focal point for the evolution of chimeric or evolved G alpha 
subunits that can transmit signals from mammalian 7-TM's to yeast downstream genes. 

Yeast based bioassays for mammalian receptors will greatly facilitate the 
discovery of novel ligands. Kang et al. rMai Cell Biol. 10:2582-2590 (1990)) have described 
the partial complementation of yeast strains bearing mutations in SCG1 (GPA1), a 
homologue of the alpha subunits of G proteins involved in signal transduction in mammalian 
cells, by mammalian and hybrid yeast/mammalian G alpha proteins. These hybrids have 
partial function, such as complementing the growth defect in scgl strains, but do not allow 
mating and hence do not fully complement function in the pheromone signal transduction 
pathway. Price et al. (Mai Cell Biol. 15:6188-6195 (1995)) have expressed rat somatostatin 
receptor subtype 2 (SSTR2) in yeast and demonstrated transmission of ligand binding 
signals by this 7-TM receptor through yeast and chimeric mammalianAyeast G alpha subunits 
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("coupling") to a HIS3 reporter gene, under control of the pheromone responsive promoter 
FUS-1 enabling otherwise HIS3(>) celts to grow on minimal medium lacking histidine. 

Such strains are useful as reporter strains for mammalian receptors, but 
suffer from important limitations as exemplified by the study of Kang et al., where there 
5 appears to be a block in the transmission of signals from the yeast pheromone receptors to 
the mammalian G proteins. In general, to couple a mammalian 7-TM receptor to yeast signal 
transduction pathways one couples the mammalian receptor to yeast, mammalian, or 
chimeric G alpha proteins, and these will in turn productively interact with downstream 
components in the pathway to induce expression of a pheromone responsive promoter such 
10 as FUS-1. Such functional reconstitution is commonly referred to as "coupling". 

The methods described herein can ba used to evolve the coupling of 
mammalian 7-TM receptors to yeast signal transduction pathways. A typical approach is as 
follows: (1) clone a 7-TM of interest into a yeast strain with a modified pheromone response 
pathway similar to that described by Price (e.g., strains deficient in FAR1, a negative 
1 5 regulator of G, cyclins, and deficient in SST2 which causes the cells to be hypersensitive to 
the presence of pheromone), (2) construct libraries of chimeras between the mammalian G 
alpha protein(s) known or thought to interact with the GPA1 or homologous yeast G alpha 
proteins, (3) place a selectable reporter gene such as HIS3 under control of the pheromone 
responsive promoter FUS1 (Price et al., Mol. Cell Biol. 15:6188-6195 (1995)). Alternatively. 
20 a screenable gene such as luciferase may be placed under the control of the FUS1 

promoter; (4) transform library (2) into strain (3) (HIS(-)), (5) screen or select for expression 
of the reporter in response to the ligand of interest, for example by growing the library of 
transformants on minimal plates in the presence of ligand to demand HIS3 expression, (6) 
recover the selected cells, and and apply RSR to evolve improved expression of the reporter 
25 under the control of the pheromone responsive promoter FUS1 . 

A second important consideration in evolving strains with optimized reporter 
constructs for signal transduction pathways of interest is optimizing the signal to noise ratio 
(the ratio of gene expression under inducing vs noninducing conditions). Many 7-TM 
pathways are leaky such that the maximal induction of a typical reporter gene is 5 to 10-fold 
30 over background. This range of signal to noise may be insufficient to detect small effects in 
many high through put assays. Therefore, it is of interest to couple the 7-TM pathway to a 
second nonlinear amplification system that is tuned to be below but near the threshold of 
activation in the uninduced state. An example of a nonlinear amplification system is 
expression of genes driven by the lambda P L promoter. Complex cooperative interactions 
35 between lambda repressor bound at three adjacent sites in the cl promoter result in very 
efficient repression above a certain concentration of repressor. Below a critical threshold 
dramatic induction is seen and there is a window within which a small decrease in repressor 
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concentration leads to a large increase in gene expression (Ptashne, A GflnetiC 
RwHeh-Phaoe Lambda and Hi nher Organisms. Blackwell Scientific Publ. Cambridge, MA, 
1992). Analogous effects are seen for some eukaryotic promoters such as those regulated 
by GAL4. Placing the expression of a limiting component of a transcription factor for such a 
5 promoter (GAL4) under the control of a GAL4 enhanced 7-TM responsive promoter results 
in small levels of induction of the 7-TM pathway signal being amplified to a much larger 
change in the expression of a reporter construct also under the control of a GAL4 dependent 
promoter. 

An example of such a coupled system is to place GAL4 under control of the 

10 FUS-1 pheromone responsive promoter and to have the intracellular GAL4 (itself a 

transcriptional enhancer) level positively feedback on itself by placing a GAL4 binding site 
upstream of the FUS-1 promoter. A reporter gene is also put under the control of a GAL4 
activated promoter. This system is designed so that GAL4 expression will nonlineariy self- 
amplify and co-amplify expression of a reporter gene such as luciferase upon reaching a 

15 certain threshold in the cell. RSR can be used to great advantage to evolve reporter 

constructs with the desired signaling properties, as follows: (1) A single plasmid construct is 
made which contains both the GAL47pheromone pathway regulated GAL4 gene and the 
GAL4 regulated reporter gene. (2) This construct is mutagenized and transformed into the 
appropriately engineered yeast strain expressing a 7-TM and chimeric yeast/mammalian 

20 protein of interest. (3) Cells are stimulated with agonists and screened (or selected) based 
on the activity of the reporter gene. In a preferred format, luciferase is the reporter gene and 
activity is quantitated before and after stimulation with the agonist, thus allowing for a 
quantitative measurement of signal to noise for each colony. (4) Cells with improved reporter 
properties are recovered, the constructs are shuffled, and RSR is applied to further evolve 

25 the plasmid to give optimal signal to noise characteristics. 

These approaches are general and illustrate how any component of a signal 
transduction pathway or transcription factor could be evolved using RSR and the screens 
and selections described above. For example, these specific methods could be used to 
evolve 7-TM receptors with specificity for novel ligands, specificity of nuclear receptors for 

30 novel ligands (for example to obtain herbicide or other small molecule-indudble expression 
of genes of interest in transgenic plants, such that a given set of genes can be induced upon 
treatment with a given chemical agent), specificity of transcription factors to be responsive to 
viral factors (thus inducing antiviral or lethal genes in cells expressing this transcription factor 
(transgenics or cells treated with gene therapy constructs), or specificity of transcription 

35 factors for activity in cancer cells (for example p53 deficient cells, thus allowing one to infect 
with gene therapy constructs expressing conditionally lethal genes in a tumor specific 
fashion). 
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The following examples are offered by way of illustration, not by way of 

limitation. 

EXPERIMENTAL EXAMPLES 
I. Evolution of BIAP 

5 A preferred strategy to evolve BIAP is as follows. A codon usage libary is 

constructed from 60-mer oligonucleotides such that the central 20 bases of each oligo 
specifies the wild type protein, but encodes the wild-type protein sequence with degenerate 
codons. Preferably, very rare codons for the prokaryotic host of choice, such as £ coli, are 
not used. The 20 bases at each end of the oligo use non-degenerate, but preferred, codons 

10 in £ coli. The oligonucleotides are assembled into full-length genes as described above. 
The assembled products are cloned into an expression vector by techniques well known in 
the art. In some embodiments, the codon usage library is expressed with a library of 
secretory leader sequences, each of which directs the encoded BIAP protein to the £ coli 
periplasm. A library of leader sequences is used to optimize the combination of leader 

1 5 sequence and mutant. Examples of leader sequences are reviewed by Seriate et al. (Ann 
Rev. Genet. 24:215-248 (1990)). The cloned BIAP genes are expressed under the control of 
an inducible promoter such as the arabinose promoter. Arablnose- induced colonies are 
screened by spraying with a substrate for BIAP, bromc-chloro-indolyl phosphate (BCIP). The 
bluest colonies are picked visually and subjected to the RSR procedures described herein. 

20 The oligonucleotides for construction of the codon usage library are listed in 

Table II. The corresponding locations of these promoters is provided in Figure 1. 

Table II 

1 . AACCCTCCAG TTCCGAACCC CATATGATGA TCACCCTGCG TAAACTGCCG 
25 2. AACCCTCCAG TTCCGAACCC CATATGAAAA AAACCGCT 

3. AACCCTCCAG TTCCGAACCC ATATACATAT GCGTGCTAAA 

4. AACCCTCCAG TTCCGAACCC CATATGAAAT ACCTGCTGCC GACC 

5. AACCCTCCAG TTCCGAACCC GATATACATA TGAAACAGTC 

6. TGGTGTTATG TCTGCTCAGG CDATGGCDGT DGAYTTYCAY CTGGTTCCGG 
30 TTGAAGAGGA 

7. GGCTGGTTTC GCTACCGTTG CDCARGCDGC DCCDAARGAY CTGGTTCCGG 
TTGAAGAGGA 

8. CACCCCGATC GCTATCTCTT CYTTYGCDTC YACYGGYTCY CTGGTTCCGG 
TTGAAGAGGA 

35 9. GCTGCTGGCT GCTCAGCCGG CDATGGCDAT GGAYATYGGY CTGGTTCCGG 
TTGAAGAGGA 
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10. TGCCGCTGCT GTTCACCCCG GTDACYAARG CDGCDCARGT DCTGGTTCCG 
GTTGAAGAGG A 

1 1 . CCCGGCTTTC TGGAACCGTC ARGCDGCOCA RGCDCTGGAC GTTGCTAAAA 
AACTGCAGCC 

5 12. ACGTTATCCT GTTCCTGGGT GAYGGYATGG GYGTDCCDAC CGTTACCGCT 
ACCCGTATCC 

1 3. AAACTGGGTC CGGAAACCCC DCTGGCDATG GAYCARTTYC CGTACGTTGC 
TCTGTCTAAA 

14. GGTTCCGGAC TCTGCTGGTA CYGCDACYGC DTAYCTGTGC GGTGTTAAAG 
10 GTAACTACCG 

15. CTGCTCGTTA CAACCAGTGC AARACYACYC GYGGYAAYGA AGTTACCTCT 
GTTATGAACC 

1 6. TCTGTTGGTG TTGTTACCAC YACYCGYGTD CARCAYGCDT CTCCGGCTGG 
TGCTTACGCT 

15 17. GTACTCTGAC GCTGACCTGC CDGCDGAYGC DCARATGAAC GGTTGCCAGG 
ACATCGCTGC 

1 8. ACATCGACGT TATCCTGGGT GGYGG YCGYA ARTAYATGTT CCCGGTTGGT 
ACCCCGGACC 

19. TCTGTTAACG GTGTTCGTAA RCGYAARCAR AAYCTGGTDC AGGCTTGGCA 
20 GGCTAAACAC 

20. GAACCGTACC GCTCTGCTGC ARGCDGCDGA YGAYTCYTCT GTTACCCACC 
TGATGGGTCT 

21 . AATACAACGT TCAGCAGGAC CAYACYAARG AYCCDACYCT GCAGGAAATG 
ACCGAAGTTG 

25 22. AACCCGCGTG GTTTCTACCT GTTYGTDGAR GGYGGYCGYA TCGACCACGG 
TCACCACGAC 

23. GACCGAAGCT GGTATGTTCG AYAAYGCDAT YGCDAARGCT AACGAACTGA 
CCTCTGAACT 

24. CCGCTGACCA CTCTCACGTT TTYTCYTTYG GYGGYTAYAC CCTGCGTGGT 
30 ACCTCTATCT 

25. GCTCTGGACT CTAAATCTTA YACYTCYATY CTGTAYGGYA ACGGTCCGGG 

TTACGCTCTG 

26. CGTTAACGAC TCTACCTCTG ARGAYCCDTC YTAYCARCAG CAGGCTGCTG 
TTCCGCAGGC 

35 27. AAGACGTTGC TGTTTTCGCT CGYGGYCCDC ARGCDCAYCT GGTTCACGGT 
GTTGAAGAAG 
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28. ATGGCTTTCG CTGGTTGCGT OGARCCDTAY ACYGAYTGYA ACCTGCCGGC 
TCCGACCACC 

29. TGCTCACCTG GCTGCTTMAC CDCCDCCDCT GGCDCTGCTG GCTGGTGCTA 
TGCTGCTCCT C 

30. TTCCGCCTCT AGAGAATTCT TARTACAGRG THGGHGCCAG GAGGAGCAGC 
ATAGCACCAG CC 

31. AAGCAGCCAG GTGAGCAGCG TCHGGRATRG ARGTHGCGGT GGTCGGAGCC 
GGCAGGTT 

32. CGCAACCAGC GAAAGCCATG ATRTGHGCHA CRAARGTYTC TTCTTCAACA 
CCGTGAACCA 

33. GCGAAAACAG CAACGTCTTC RCCRCCRTGR GTYTCRGAHG CCTGCGGAAC 
AGCAGCCTGC 

34. AGAGGTAGAG TCGTTAACGT CHGGRCGRGA RCCRCCRCCC AGAGCGTAAC 
CCGGACCGTT 

35. AAGATTTAGA GTCCAGAGCT TTRGAHGGHG CCAGRCCRAA GATAGAGGTA 
CGACGCAGGG 

36. ACGTGAGAGT GGTCAGCGGT HACCAGRATC AGRGTRTCCA GTTCAGAGGT 
CAGTTCGTTA 

37. GAACATACCA GCTTCGGTCA GHGCCATRTA HGCYTTRTCG TCGTGGTGAC 
CGTGGTCGAT 

38. GGTAGAAACC ACGCGGGTTA CGRGAHACHA CRCGCAGHGC AACTTCGGTC 
ATTTCCTGCA 

39. TCCTGCTGAA CGTTGTATTT CATRTCHGCH GGYTCRAACA GACCCATCAG 
GTGGGTAACA 

40. CAGCAGAGCG GTACGGTTCC AHACRTAYTG HGCRCCYTGG TGTTTAGCCT 
GCCAAGCCTG 

41. TACGAACACC GTTAACAGAA GCRTCRTCHG GRTAYTCHGG GTCCGGGGTA 
CCAACCGGGA 

42. CCCAGGATAA CGTCGATGTC CATRTTRTTH ACCAGYTGHG CAGCGATGTC 
CTGGCAACCG 

43. CAGGTCAGCG TCAGAGTACC ARTTRCGRTT HACRGTRTGA GCGTAAGCAC 
CAGCCGGAGA 

44. TGGTAACAAC ACCAACAGAT TTRCCHGCYT TYTTHGCRCG GTTCATAACA 
GAGGTAACTT 

45. CACTGGTTGT AACGAGCAGC HGCRGAHACR CCRATRGTRC GGTAGTTACC 
TTTAACACCG 
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46. ACCAGCAGAG TCCGGAACCT GRCGRTCHAC RTTRTARGTT TTAGACAGAG 
CAACGTACGG 

47. GGGTTTCCGG ACCCAGTTTA CCRTTCATYT GRCCYTTCAG GATACGGGTA 
GCGGTAACGG 

5 48. CCCAGGAACA GGATAACGTT YTTHGCHGCR GTYTGRATHG GCTGCAGTTT 
TTTAGCAACG 

49. ACGG7TCCAG AAAGCCGGGT CTTCCTC1TC AACCGGAACC AG 

50. CCTGAGCAGA CATAACACCA GCHGCHACHG CHACHGCCAG CGGCAGTTTA 
CGCAGGGTGA 

10 51 . ACCGGGGTGA ACAGCAGCGG CAGCAGHGCC AGHGCRATRG TRGACTGTTT 
CATATGTATATC 

52. GCCGGCTGAG CAGCCAGCAG CAGCAGRCCH GCHGCHGCGG TCGGCAGCAG 
GTAGTTTCA 

53. AAGAGATAGC GATCGGGGTG GTCAGHACRA TRCCCAGCAG TTTAGCACGC 
15 ATATGTATAT 

54. CAACGGTAGC GAAACCAGCC AGHGCHACHG CRATHGCRAT AGCGG I UN 
TTCATATG 

55 AGAATTCTCT AGAGGCGGAA ACTCTCCAAC TCCCAGGTT 
56. TGAGAGGTTG AGGGTCCAAT TGGGAGGTCA AGGCTTGGG 

20 

All oligonucleotides listed 5' to 3*. The code for degenerate positions is: R: A or G; Y: C or T; 

H: A or C or T; D: A or G or T. 

II Mammalian Surface Display 

During an immune response antibodies naturally undergo a process of affinity 

25 maturation resulting in mutant antibodies with improved affinities for their cognate antigens. 
This process is driven by somatic hypermutation of antibody genes coupled with clonal 
selection (Berek and Milstein, tmmun. Rev. 96:23-41 (1 987)). Patten et al. (Science 
271:1086-1091 (1996)) have reconstructed the progression of a catalytic antibody from the 
germline sequence, which binds a p-n'rtrophenylphosphonate hapten with an affinity of 135 

30 micromolar, to the affinity matured sequence which has acquired nine somatic mutations and 
binds with an affinity of 10 nanomolar. The affinity maturation of this antibody can be 
recapitulated and improved upon using cassette mutagenesis of the CDRs (or random 
mutagenesis such as with PCR), mammalian display, FACS selection for improved binding, 
and RSR to rapidly evolve improved affinity by recombining mutations encoding improved 

35 binding. 
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Genomic antibody expression shuttle vectors similar to those described by 
Gascoigne et al. (Proc. Natl. Acad. Sd. ru.S A 1 84:2936-2940 (1987)) are constructed such 
that libraries of mutant V region exons can be readily cloned into the shuttle vectors. The 
kappa construct is cloned onto a plasmid encoding puromycin resistance and the heavy 
5 chain is cloned onto a neomycin resistance encoding vector. The cDNA derived variable 
region sequences encoding the mature and germline heavy and light chain V regions are 
reconfigured by PCR mutagenesis into genomic exons flanked by Sfi I sites with 
complementary Sfi I sites placed at the appropriate locations in the genomic shuttle vectors. 
The oligonucleotides used to create the intronic Sfi I sites flanking the VDJ exon are: 5* Sfi I: 

1 0 S'-TTCCATTTCA TACATGGCCG AAGGGGCCGT GCCATGAGGA TTTT-3'; 3' Sfi I: 5'- 
TTCTAAATG CATGTTGGCC TCCTTGGCCG GATTCTGAGC CTTCAGGACC A-3'. 
Standard PCR mutagenesis protocols are applied to produce libraries of mutants wherein the 
following sets of residues (numbered according to Kabat, Sequences of Proteins of 
Immunological Interest. U.S. Dept of Health and Human Services, 1991) are randomized to 

1 5 NNK codons (GATC.GATC.GC): 



Chain 


CDR 


Mutated residues 


V-L 


1 


30, 31,34 


V-L 


2 


52, 53, 55 


V-H 


2 


55, 56, 65 


V-H 


»4» 


74, 76, 78 



Stable transfedant lines are made for each of the two light and heavy chain 
constructs (mature and germline) using the B cell myeloma AG8-653 (a gift from J. Kearney) 
as a host using standard electroporation protocols. Libraries of mutant plasmids encoding 
the indicated libraries of V-L mutants are transfected into the stable transformant expressing 

25 the germline V-H; and the V-H mutants are transfected into the germline V-L stable 

transfectant line. In both cases, the libraries are introduced by protoplast fusion (Sambrook 
et al., Molecular Cloning. CSH Press (1987)) to ensure that the majority of transfected cells 
receive one and only one mutant plasmid sequence (which would not be the case for 
electroporation where the majority of the transfected cells would receive many plasmids, 

30 each expressing a different mutant sequence). 

The p-nitrophenylphosphonate hapten (JWJ-1) recognized by this antibody is 
synthesized as described by Patten et al. (Science 271:1086-1091 (1996)). JWJ-1 is 
coupled directly to 5-(((2-aminoethyQthio)aceryi)fluorescein (Molecular Probes, Inc.) by 
formation of an amide bond using a standard coupling chemistry such as EDAC (March, 

35 Advanced Organic Chemistry. Third edition, John Wiley and Sons, 1 985) to give a 

monomelic JWJ-1 -FFTC probe. A "dimeric" conjugate (two molecules of JWJ-1 coupled to a 
FACS marker) is made in order to get a higher avidity probe, thus making low affinity 
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interactions (such as with the germiine antibody) more readily detected by FACS. This is 
generated by staining with Texas Red conjugated to an anti-fluorescein antibody in the 
presence of two equivalents of JWJ-1-HTC. The bivalent structure of IgG then provides a 
homogeneous bivalent reagent. A spin column is used to remove excess JWJ-1-FITC 
5 molecules that are not bound to the anti-FTTC reagent. A tetravalent reagent is made as 
follows. One equivalent of biotin is coupled with EDAC to two equivalents of 
ethylenediamine, and this is then be coupled to the free carboxylate on JWJ-1. The 
biotiylated JWJ-1 product is purified by ion exchange chromatography and characterized by 
mass spectrometry. FITC labelled avidin is incubated with the biotinylated JWJ-1 in order to 

10 generate a tetravalent probe. 

The FACS selection is performed as follows, according to a protocol similar to 
that of Panka et al. fProc. Natl. Acad. Sci. fU.S.A.^ 85:3080-3084 (1988)). After transfection 
of libraries of mutant antibody genes by the method of protoplast fusion (with recovery for 36 
- 72 hours), the cells are incubated on ice with fluorescently labelled hapten. The incubation 

15 is done on ice to minimize pinoeytosis of the FITC conjugate which may contribute to 
nonspecific background. The cells are then sorted on the FACS either with or without a 
washing step. FACSing without a washing step is preferable because the off rate for the 
germiine antibody prior to affinity maturation is expected to be very fast (>0.1 sec-1; Patten 
et al., Science 271:1086-1091 (1996)); a washing step adds a complicating variable. The 

20 brightest 0. 1 - 1 0% of the cells are collected. 

Four parameters are manipulated to optimize the selection for increased 
binding: monomelic vs dimeric vs tetramerie hapten, concentration of hapten used in the 
staining reaction (low concentration selects for high affinity Kd"s), time between washing and 
FACS (longer time selects for low off rates), and selectivity in the gating (i.e. take the top 

25 0.1% to 10%, more preferably the top 0.1%). The constructs expressing the germiine, 
mature, and both combinations of half germiine are used as controls to optimize this 
selectivity. 

Plasmids are recovered from the FACS selected cells by the transformation of 
an £ co// host with Hirt supematants. Alternatively, the mutant V gene exons are PCR- 
30 amplified from the FACS selected cells. The recovered V gene exons are subjected to RSR, 
recloned into the corresponding genomic shuttle vector, and the procedure recursively 
applied until the mean fluorescence intensity has increased. A relevant positive control for 
improved binding js transfection with the affinity matured 48G7 exons (Patten et at., op. tit). 

In a further experiment, equal numbers of germiine and each of the two half 
35 germiine transfectants are mixed. The brightest cells are selected under conditions 

described above. The V genes are recovered by PCR, recloned into expression vectors, and 
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eo-transfected, either two plasmids per £ co/i followed by protoplast fusion, or by bulk 
electroporation. The mean fluorescent intensity of the transfectants should increase due to 
enrichment of mature relative to germline V regions. 

This methodology can be applied to evolve any receptor-figand or binding 
5 partner interaction. Natural expression formats can be used to express libraries of mutants 
of any receptor for which one wants to improve the affinity for the natural or novel ligands. 
Typical examples would be improvement of the affinity of T cell receptors for ligands of 
interest (i.e. MHCftumor peptide antigen complexes) or TNF receptor for TNF (soluble forms 
of TNF receptors are used therapeutically to neutralize TNF activity). 

1 o ' This format can also be used to select for mutant forms of ligands by 

expressing the ligand in a membrane bound form with an engineered membrane anchor by a 
strategy analogous to that of Wettstein et al fJ. Bm. Med. 174:219-28 (1991)). FACS 
selection is then performed with fluorescently labelled receptor. In this format one could, for 
example, evolve improved receptor antagonists from naturally occurring receptor antagonists 

15 (IL1 receptor antagonist, for example). Mutant forms of agonists with improved affinity for 
their cognate receptors could also be evolved in this format. These mutants would be 
candidates for improved agonists or potent receptor antagonists, analogous to reported 
antagonistic mutant forms of IL3. 
III. Evolution of Aloha Interferon 

20 There are at hand 18 known non-allelic human interferon-alpha (INF-a) 

genes, with highly related primary structures (78-95% identical) and with a broad range of 
biological activities. Many hybrid interferons with interesting biological activities differing from 
the parental molecules have been described (reviewed by Horisberger and Di Marco, Pharm. 
Ther. 66:507-534 (1995)). A consensus human alpha interferon, IFN-Con1 , has been 

25 constructed synthetically wherein the most common residue in fourteen known IFN-o's has 
been put at each position, and it compares favorably with the naturally occurring interferons 
(Ores et at., J. Interferon Res. 12:55-59 (1992)). This IFN contains 20 amino add changes 
relative to IFN-ct2a, the INF-a to which it is most closely related. IFN-Con1 has 10-fold 
higher specific antiviral activity than any known natural IFN subtype. IFN-a Con1 has in vitro 

30 activities 10 to 20 fold higher than that of recombinant IFN a-2a (the major IFN used 
clinically) in antiviral, antiproliferative and NKcell activation. Thus, there is considerable 
interest in producing interferon hybrids which combine the most desirable traits from two or 
more interferons. However, given the enormous number of potential hybrids and the lack of 
a crystal structure of IFN -a or of the IFN-a receptor, there is a perceived impasse in the 

35 development of novel hybrids (Horisberger and Di Marco, Pharm. Ther. 66:507-534 (1995)). 
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The biological effects of IFN-ct's are diverse, and include such properties as 
induction of antiviral state (induction of factors that arrest translation and degrade mRNA); 
inhibition of cell growth; induction of Class I and Pass II MHC; activation of monocytes and 
macrophages; activation of natural killer cells; activation of cytotoxic T cells; modulation of Ig 
5 synthesis in B cells; and pyrogenic activity. 

The various IFN-a's subtypes have unique spectra of activities on different 
target cells and unique side effect profiles (Ortaldo et al., Proa Natl. Acad. Sd. AI.S.A ) 
81:4926-4929 (1984); Overall et al., J Interferon Res. 12281-288 (1992); Fish and Stebbing, 
Btochem. Bioohvs. Res. Comm. 112:537-546 (1983); Week et al., J. Gen. Virol. 57:233-237 

10 (1981)). For example, human IFNa has very mild side effects but low antiviral activity. 
Human IFNaS has very high antiviral activity, but relatively severe side effects. Human 
IFNa7 lacks NK activity and blocks NK stimulation by other INFot's. Human IFN-a J lacks the 
ability to stimulate NK cells, but it can bind to the IFN-a receptor on NK cells and block the 
stimulatory activity of IFN-aA (Langer et al., J. Interferon Res. 6:97-105 (1986)). 

1 5 The therapeutic applications of interferons are limited by diverse and severe 

safe effect profiles which include flu-like symptoms, fatigue, neurological disorders including 
hallucination, fever, hepatic enzyme elevation, and leukopenia. The multiplicity of effects of 
IFN-a's has stimulated the hypothesis that there may be more than one receptor or a 
mutticomponent receptor for the IFN-a family (R. Hu et al., J. Biol. Chem. 268:12591-12595 

20 (1993)). Thus, the existence of abundant naturally occurring diversity within the human 

alpha IFN"s (and hence a large sequence space of recombinants) along with the complexity 
of the IFN-a receptors and activities creates an opportunity for the construction of superior 
hybrids. 

A. Complexity o f the Sequence Space 

25 Figure 2 shows the protein sequences of 11 human IFN-a's. The differences 

from consensus are indicated. Those positions where a degenerate codon can capture all of 
the diversity are indicated with an asterisk. Examination of the aligned sequences reveals 
that there are 57 positions with two, 15 positions with three, and 4 positions with four 
possible amino acids encoded in this group of alpha interferon genes. Thus, the potential 

30 diversity encoded by permutation of all of this naturally occurring diversity is: 2 s7 x 3 1S x 4 4 = 
5.3 x 10 26 . Among these hybrids, of the 76 polymorphisms spread over a total of 175 sites in 
the 11 interferon genes, 171 of the 175 changes can be incorporated into homologue 
libraries using single degenerate codons at the corresponding positions. For example, Arg, 
Trp and Gly can all be encoded by the degenerate codon (A,T,G)GG. Using such a strategy, 

35 1 .3 x 10 M hybrids can be captured with a single set of degenerate oligonucleotides. As is 
evident from Tables III to VI, 27 oligonucleotides is sufficient to shuffle all eleven human 
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alpha interferons. Virtually all of the natural diversity is thereby encoded and fully permuted 
due to degeneracies in the nine "block" oligonucleotides in Table V. 

8 Properties of « "Coarse Grain" s^h a f Homoiooue sggygpsa ff g an 

The modelled structure of IFN alpha (Kontsek, Acta V8r. 38:345-360 (1994)) 
has been divided into nine segments based on a combination of criteria of maintaining 
secondary structure elements as single units and placing/choosing placement of the 
segment boundaries in regions of high identity. Hence, one can capture the whole family 
with a single set of mildly degenerate oligonucleotides. Table III and Figure 2 give the 
precise locations of these boundaries at the protein and DNA levels respectively, It should 
be emphasized that this particular segmentation scheme is arbitrary and that other 
segmentation schemes could also be pursued. The general strategy does not depend on 
placement of recombination boundaries at regions of high identity between the family 
members or on any particular algorithm for breaking the structure into segments. 

Table III 

Segmentation Scheme for Aloha mtPrf»m n 



Segment 


Amino Acids 


# Alleles 


# Permutations of all Sequence 
Variations 


1 


1-21 


5 


1024 


2 


22-51 


10 


6.2 X10 4 


3 


52-67 


6 


96 


4 


68-80 


7 


1024 


5 


81-92 


7 


192 


6 


93-115 


10 


2.5 x 10 5 


7 


116-131 


4 


8 


8 


132-138 


4 


8 


9 


139-167 


9 


9216 



Many of the IFN's are identical over some of the segments, and thus there are 
less than eleven different "alleles" of each segment Thus, a library consisting of the 
permutations of the segment "alleles" would have a potential complexity of 2.1 x 10 7 (5 
segment #1's times 10 segment #2's x .... x 9 segment # 9*s). This is far more than can be 
examined in most of the screening procedures described, and thus this is a good problem for 
using RSR to search the sequence space. 
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c Detailed Strategics far Using RSR to §gflrcfj the [FlfcalBha Ho ""'? T n ift 

Saguflnsa Saasa 

The methods described herein for oligo directed shuffling (i.e. bridge 
oligonucleotides) are employed to construct libraries of interferon alpha hybrids, and the 
general methods described above are employed to screen or select these mutants for 
improved function. As there are numerous formats in which to screen or select for improved 
interferon activity, many of which depend on the unique properties of interferons, exemplary 
descriptions of IFN based assays are described below. 

D - A Protocol for a Coarse Grain Search of Hy b rid IFN Aloha fr ^n^ 



In brief, libraries are constructed wherein the 1 1 homologous forms of the 
nine segments are permuted (note that in many cases two homotogues are identical over a 
given segment). All nine segments are PCR- amplified out of all eleven IFN alpha genes 
with the eighteen oligonucleotides listed in Table IV, and reassembled into full length genes 
with oligo directed recombination. An arbitrary number, e.g., 1000, clones from the library 
are prepared in a 96-well expression/purification format. Hybrids with the most potent 
antiviral activities are screened. Nucleic acid is recovered by PCR amplification, and 
subjected to recombination using bridge oligonucleotides. These steps are repeated until 
candidates with desired properties are obtained. 

E. Strategies for Examining the Smcb gf > io» Fine gala H yftrff 

In brief, each of the nine segments is synthesized with one degenerate oligo 
per segment Degeneracies are chosen to capture aR of the IFN-aipha diversity that can be 
captured with a single degenerate codon without adding any non-natural sequence. A 
second set of degenerate oligonucleotides encoding the nine segments is generated wherein 
all of the natural diversity is captured, but additional non-natural mutations are included at 
positions where necessitated by the constraints of the genetic code. In most cases all of the 
diversity can be captured with a single degenerate codon; in some cases a degenerate 
codon will capture all of the natural diversity but will add one non-natural mutation; at a few 
postions it is not possible to capture the natural diversity without putting in a highly 
degenerate codon which will create more than one non-natural mutation. It is at these 
positions that this second set of oligonucleotides will differ from the first set by being more 
inclusive. Each of the nine synthetic segments is then amplified by PCR with the 18 PCR 
oligonucleotides. Full length genes using the oligo directed recombination method are 
generated, transfected Into a host, and assayed for hybrids with desired properties. The best 
hybrids from (e.g, the top 10%, 1% or 0.1%; preferably the top 1%) are subjected to RSR 
and the process repeated until a candidate with the desired properties is obtained. 
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f. "NorHwntte" Fine grain Search 

On the one hand, one could make libraries wherein each segment is derived 
from the degenerate synthetic oligonucleotides which will encode random permutations of 
the homologue diversity. In this case, the initial library will very sparsely search the space of 
5 >10 a possible fine grain hybrids that are possible with this family of genes. One could 
proceed by breeding positives together from this search. However, there would be a large 
number of differences between independent members of such libraries, and consequently 
the breeding process would not be very "gentle" because pools of relatively divergent genes 
would be recombined at each step. 
10 G. "Gentle" Fine Grain Search 

One way to make this approach more "gentle" would be to obtain a candidate 
starting point and to gently search from there. This starting point could be either one of the 
natural IFN-alpha's (such as IFN alpha-2a which is the one that is being used most widely 
therapeutically), the characterized IFN-Con1 consensus interferon, or a hit from screening 
15 the shuffled IFN-alpha's described above. Given a starting point, one would make separate 
libraries wherein one breeds the degenerate segment libraries one at a time into the founder 
sequence. Improved hits from each library would then be bred together to gentry build up 
mutations all throughout the molecule. 

H. Functional Cellular Asaava 

20 The following assays, well known in the art, are used to screen IFN alpha 

mutants: inhibition of viral killing; standard error of 30-50%; inhibition of plaque forming units; 
very low standard error (can measure small effects); reduced viral yield (useful for nonlethal, 
nonplaque forming viruses); inhibition of cell growth (3H-thymidine uptake assay; activation 
of NK cells to kill tumor cells; suppression of tumor formation by human INF administered to 

25 nude mice engrafted with human tumors (skin tumors for example). 

Most of these assays are amenable to high throughput screening. Libraries of 
recombinant IFN alpha mutants are expressed and purified in high throughput formats such 
as expression, lysis and purification in a 96-well format using anti-IFN antibodies or an 
epitope tag and affinity resin. The purified IFN preparations are screened in a high 

30 throughput format, scored, and the mutants encoding the highest activities of interest are 
subjected to further mutagenesis, such as RSR, and the process repeated until a desired 
level of activity is obtained. 

I. Phage Display 

Standard phage display formats are used to display biologically active IFN. 
35 Libraries of chimeric IFN genes are expressed in this format and are selected (positively or 
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negatively) for binding (or reduced binding) to one or more purified IFN receptor preparations 
or to one or more IFN receptor expressing cell types. 

J - GFP or Luctferase Under Control of IFN-Ai pha Dea^ .frnt Premier 

Protein expressed by mutants can be screened in high throughput format on a 
reporter cell line which expresses GFP or luctferase under the control of an IFN alpha 
responsive promoter, such as an MHC Class I promoter driving GFP expression. 

K. Stimulation of Target Cells with Intact intern. Pnrtttffti 

. Purification of active IFN will limit the throughput of the assays described 
above. Expression of active IFN alpha on filamentous phage M13 would allow one to obtain 
homogenous preparations of IFN mutants in a format where thousands or tens of thousands 
of mutants could readily be handled. Gram et at (J. Imm Math 161:169-176 (1993)) have 
demonstrated that human IL3, a cytokine with a protein fold similar in topology to IFN alpha, 
can be expressed on the surface of M13 and that the resultant phage can present active IL3 
to IL3 dependent cell fines. Similarly, Saggio et al. (Saos 152:35-39 (1995)) have shown 
that human ciliary neurotrophic factor, a four helix bundle cytokine, is biologically active when 
expressed on phage at concentrations similar to those of the soluble cytokine. Analogously, 
libraries of IFN alpha mutants on M13 can be expressed and phage stocks of defined titre 
used to present biologically active IFN in the high throughput assays and selections 
described herein. 

The following calculation supports the feasibility of applying this technology to 
IFN alpha. Assuming (1) titres of 1x10'° phage/ml with five active copies of interferon 
displayed per phage, and (2) that the displayed interferon is equivalents active to soluble 
recombinant interferon (it may well be more potent due to multi-valency), the question then is 
whether one can reasonably expect to see biological activity. 

(1x10 10 phage/ml) x (5 IFN molecules/phage) x (1 mole/exlQ 23 molecules) x 

(26,000 gm/mole) x (10 9 ng/gm) = 2.2 ng/ml 

The range of concentration used in biological assays is: 1 ng/ml for NK 
activation. 0.1 - 10 ng/ml for antiproliferative activity on Eskol cells, and 0.1-1 ng/ml on 
Daudi cells (Ozes et al., J, Interferon Res, 12:55-59 (1992)). Although some subtypes are 
glycosylated, interferon alpha2a and consensus interferon are expressed in active 
recombinant form in £ co//, so at least these two do not require grycosylation for activity. 
Thus. IFN alpha expressed on filamentous phage is likely to be biologically active as phage 
stocks without further concentration. Libraries of IFN chimeras are expressed in phage 
display formats and scored in the assays described above and below to identify mutants with 
improved properties to be put into further rounds of RSR. 
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When one phage is sufficient to activate one ceil due to the high valency state 
of the displayed protein (five per phage in the gene III format; hundreds per phage in the 
gene VIII format; tens in the lambda gene V format), then a phage stock can be used directly 
at suitable dilution to stimulate cells with a GFP reporter construct under the control of an 
5 IFN responsive promoter. Assuming that the phage remain attached after stimulation, 

expression and FACS purification of the responsive cells, one could then directly FACS purify 
hybrids with improved activity from very large libraries (up to and perhaps larger than 10 7 
phage per FACS run). 

A second way in which FACS is used to advantage in this format is the 
10 following. Cells can be stimulated in a multiwell format with one phage stock per well and a 
GFP type reporter construct. All stimulated cells are FACS purified to collect the brightest 
cells, and the IFN genes recovered and subjected to RSR, with iteration of the protocol until 
the desired level of improvement is obtained. In this protocol the stimulation is performed 
with individual concentrated lysates and hence the requirement that a single phage be 
15 sufficient to stimulate the cell is relaxed. Furthermore, one can gate to collect the brightest 
cells which, in turn, should have the most potent phage attached to them. 
L Cell Surface Display Protocol for IFN Aloha Mutants 

A sample protocol follows for the cell surface display of IFN alpha mutants. 
This form of display has at least two advantages over phage display. First, the protein is 
20 displayed by a eukaryotic cell and hence can be expressed in a properly glycosylated form 
which may be necessary for some IFN alphas (and other growth factors). Secondly, it is a 
very high valency display format and is preferred in detecting activity from very weakly active 
mutants. 

In brief, a library of mutant IFN's is constructed wherein a polypeptide signal 
25 for addition of a phosphoinositol tail has been fused to the carboxyl terminus, thus targeting 
the protein for surface expression (Wettstein et al., J. Exp. Med. 174:219-28 (1991)). The 
library is used to transfect reporter cells described above (ludferase reporter gene) in a 
microtiter format Positives are detected with a charge coupling device(CCD) camera. 
Nucleic acids are recovered either by HIRT and retransformation of the host or by PCR, and 
30 are subjected to RSR for further evolution. 

M. Autocrine Display Protocol for Viral Resistance 

A sample protocol follows for the autocrine display of IFN alpha mutants. In 
brief, a library of IFN- mutants is generated in a vector which allows for induction of 
expression (i.e. metaltothionein promoter) and efficient secretion. The recipient cell line 
35 carrying an IFN responsive reporter cassette (GFP or ludferase) is induced by transfection 



WO 98/27230 



PCT/US97/24239 



74 

with the mutant IFN constructs. Mutants which stimulate the IFN responsive promoter are 
detected by by FACS or CCD camera. 

A variation on this format is to challenge transfectants with virus and select 
for survivors. One could do multiple rounds of viral challenge and outgrowth on each set of 
5 transfectants prior to retrieving the genes. Multiple rounds of killing and outgrowth allow an 
exponential amplification of a small advantage and hence provide an advantage in detecting 
small improvements in viral killing. 

Olioonudeotidea needed far blockwise recombination: IB 
10 Oligonucleotides for alpha interferon shuffling 

1. 5*-TGT{G/AJATCTG[C/T]CTr.C/G)AGACC 

2. 5'-GGCACAAATG[G/A/C]G[A/C]AGAATCTCTC 

3. 5*-AGAGATTCT[G/T]C[C/T/G]CATTTGTGCC 

4. 5'-CAGTTCCAGMG[A/G]CT[G/C][C/A]AGCCATC 
15 5. ff-GATGGCTrj/GHG/ClAGrT/CJCTTCTGGAACTG 

6. 5'-CnCAATCTCTTCA[G/C]CACA 

7. 5-TGTG [G/CJTG AAGAG ATTGAAG 

8. S'-GGArr/AJIG/CJAGAtC/GHC/GlCTCCTAGA 

9. 5 , -TCTAGGAG{G/C][G/CP"CTIG/C][T/A]TCC 
20 10. 5'-<3AACTTIJVG/A]tT/A]CCAGCAA[A/CITGAAT 

1 1 . 5'-ATTCAp7GJTTGCTGG[A/T][A/T /C]AAGTTC 

12. S'-GGACTTT/CICATCCTGGCTGTG 

13. 5'-CACAGCCAGGATG[GVA]AGTCC 

14. S'-AAGAATCACTCTTTATCT 
25 15. 5'-AGATAAAGAGTGATTCTT 

16. 5'-TGGGAGGTTGTCAGAGCAG 

17. S'-CTGCTCTGACAACCTCCCA 

18. 5 , -TCAIA/TTTCCTTIC/A]CTC[T/CrTTAA 

30 Brackets indicate degeneracy with equal mixture of the specified bases at 

those positions. The purpose of the degeneracy is to allow this one set of primers to prime 
all members of the IFN family with similar efficiency. The choice of the oligo driven 
recombination points is important because they will get "overwritten" in each cycle of 
breeding and hence cannot coevolve with the rest of the sequence over many cycles of 

35 selection. 
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Table V 

Oligonucleotides needed far fine grain" recombination 
of natural diversity over each of the nine blocks 



Block #Length of oligo required 

1 76 

2 95 

3 65 

4 56 

5 51 

6 93 

7 50 

8 62 

9 80 



Table VI 

Amino adds that can be reached bv a single step 
mutation in the codon of interest. 



WIW-TypeAmiQp_ 


Amino acids reachable bv one 


Add 


mMtatifiQ 


w 


C, R, G.L 


Y 


F, S. C, H, N. D 


F 


L, I, V. S, Y, C 


L 


S, W. F, I, M, V, P 


V 


. F, L. I, M, A. D, E. G 


I 


F,UM,V.T,N.K,S,R 


A 


S,P.T,V.D,E.G 


G 


V,A,D,E, R, S,C,W 


M 


L, I, V, T, K, R 


S 


F.UY,C,W,P,T.A, R, G, N.T.I 


T 


S,P,A 1 I,M,N 1 K,S.R 


P 


S, T, A, L, H, Q. R 


C 


F. S. Y, R, G, W 


N 


Y. H.K.D. S.T.I 


Q 


Y, H, K. E, L, P. R 


H 


Y, Q, N, 0, L, P. R 


D 


Y, H.N.E. V.A.G 
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76 

E Q. K. D. V. A, G 

R L, P, H, Q, C, W, S, G, K, T, I, M 

K Q,N.E,R,T,I.M 



5 Based on this Table, the polymorphic positions in IFN alpha where all of the 

diversity can be captured by a degenerate codon have been identified. Oligonucleotides of 
the length indicated in Table V above with the degeneracies inferred from Table VI are 
synthesized. 

N. Evolution of Improved IFN-o 

10 1. Clonino 

IFN genes were cloned by PCR amplification from genomic DNA with 12 sets 
of degenerate primers by methods as discussed generally above. The PCR products were 
cloned into a standard phagemid display vector as fusions to fd bacteriophage gene III. 
Thirty clones were sequenced and compared to human alpha IFN genes in the literature. 

1 5 Most of the sequence matched known sequences exactly or nearly exactly (>98%DNA 
identity). Several clones did not match well with any known IFNs (i.e., about 93% identity) 
and are candidate novel IFN genes. One gene was a clear recombinant which presumably 
was created during the PCR. Eight of the ten clones were pooled and shuffled. These eight 
sequences contain about 66% of the known amino add changes in this gene family. 

20 2. Shufflino 

The genes were shuffled as follows. Pools of 20-50 bp and 50-100 bp 
fragments were prepared from partial Dnase I digests as described above. Additionally, 20- 
100 bp fragments were prepared from preparative PCR products of human genomic DNA 
with the same set of 12 primers. These fragments should contain all sequence diversity in 

25 the human alpha interferon locus. Chimeras were assembled by crossover PCR by 20 
cycles of (94°C x 60", 6°C x 60", 25°C x 120") followed by two rounds of 1:10 dilution into 
PCR buffer and reassembly by 20 cycles of (94°C x 30", 40°C x 30", 72°C x (30+2n)") where 
n = cycle number. Full length genes were rescued by PCR with outside primers and the 
material was doned into phagemid display vector by standard methods. Libraries of 2.5 x 

30 10 4 , 3.0 x 10 s and 2 x 10 8 complexity were obtained from the 20-50 bp, 50-100 bp and 
genomic PCR fragments, respectively- Sequencing of random chimeras verified that the 
shuffling had worked efficiently. 

3. Validation of biological activity of phaaemlds 

Large scale preps of phagemid particles were made by standard methods, 

35 using M13 VCS as the helper phage. The IFN-genelll fusion genes were induced at mid log 
phase by the addition of 0.02% arabinose. The PEG precipitated phagemid particles were 
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CsCI banded and diatyzed. The phagemid particles displayed active IFN as evidenced by the 
biological activity of phagemid preps expressing IFN-Con1. IFN2a, or the eight cloned wild 
type IFN's in a human Daudi cell antiproliferation assay (human cells) (Tymms et a!., QejofiL 
Anal Tachn. Add!. 7:53-63 (1990)). 
5 4. Screening for Imnroved activity In the Daudi assay 

Two screening strategies were used to identify clones with improved activity: 
activity assays on randomly chosen clones and activity assays on CsCi banded pods 
followed by identification of the best clones from the most active pools. 

As an example, among eight randomly chosen chimeras, three were more 
10 active than Con1, one was intermediate between Con1 and IFN2a, and four were negative. 
Figure 3 depicts the alignment of the amino acid sequences of four chimeric interferons with ■ 
IFN-Con1. 

An example of pooled clones follows. Ninety-six clones were combined into 
eight different pools of twelve and assayed as pools on Daudi cells. CsCI preps were made 

15 from the twelve dones in the most active pool (P12.7, or pool "F"). One of these dones, F4, 
was highly active with activity about 60x greater than Con1 and about 1000x greater than 
IFN2a. None of the parental IFN's had activity greater than Con1, so this represents an 
increase of about 60-fold relative to the best parental done. This done has been assayed in 
a human virus protection assay (WISH cells) (Jilbert et al.. Microbial. Path. 1:159-168 (1986) 

20 and been found to be more active than Con1 in this assay as well, thus verifying bona fide 
interferon activity rather than generalized toxicity. 

5. Evolution for activity on mouse cells 

Eight wild-type mouse IFN genes were PCR amplified by standard methods 
and cloned into the phagemid vector. One of these dones was highly active in a mouse 

25 antiviral assay (mouse cells) (Beilharz et al., J. Interferon Res. 9:305-314 (1 988) when 

displayed in this vedor. The eight human parental IFN dories and IFN2a were all inactive; 
and Con1 was weakly adive in the mouse antiviral assay. One of eight randomly screened 
human chimeras was more active than Con1. One of eight pods of 12 clones (Pool "G") 
was adive in the mouse assay. Pool "G" yielded one highly active done, G8. One of sixteen 

30 pools of ninety-six was adive. This pooi of ninety-six was broken into eight pools of twelve, 
and two of these pools were highly active. 

6. Interpretation 

Taken together, these data show that the recombination techniques described 
herein combined with the screening methods described herein can be used to improve the 
35 activity of already potent interferons on human cells. Additionally, the methods can be used 
to create a "related" adivity (activity on mouse cdls) that did not pre-exist at a detectable 
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level in the starting gene population. The data further demonstrate the applicability of the 
instant invention for creating populations of recombinant genes with Gaussian-like 
distributions of activities from which superior recombinants can be readily obtained. 
IV. Evolution of an Improved Lucjfarase 

5 The lueiferase of Photinus pyralis was PCR amplified from pGL2_basic 

(Promega Corporation, Madison, Wl). The lueiferase of Luctola nvngmlica was PCR 
amplified from pJGR (Oevine et al., Biochim. Btoohys. Acta 1173:121-132 (1993)). Both 
were doned by their start codon, encoded by Ncol, into pBAD24 (Guzman et al., J. Barter. 
177:4121-4130 (1995)). For DNAsel digestion, the lueiferase genes, including some flanking 

10 regions, were PCR amplified by the primers BADup (TGCACGGCGTCACACTTTGCTA) and 
BADdown (TACTGCCGCCAGGCAAATTCT). The PCR products were mixed in equimolar 
amounts and partially digested with DNAsel. Fragments from 70 to 280 bp were gel purified. 
Five ug fragments were assembled in a volume of 10 pi using Taq-polymerase and the 
following 15 cycles in a robocyclen 94°C, 30 seconds; 6°C, 60 seconds; 25°C, 180 

15 seconds. The sample was diluted 1:6 and cycled for another 20 cycles using a 1:1 mix of 
Taq- and Pwo-polymerase in the DNA engine (94°C, 30 seconds; 40°C, 30 seconds, 72 °C, 
30 seconds). The sample was diluted 1:4 and cycled for another 20 cycles using a 1:1 mix 
of Taq- and Pwo-polymerase in the DNA engine (94 °C, 30 seconds; 40°C, 30 seconds; 72°, 
30 seconds). To amplify the assembled DNA fragments, the assembly reaction was diluted 

20 1 :10 to 1 :100 and the primers #773 (TAGCGGATCCTACCTGACGC) and #297 

(TGAAAATCTTCTCTCATCCG) were included with the next 25 cycles using a 1:1 mix of 
Taq- and Pwo-polymerase in the DNA engine (94°C, 30 seconds; 45°C, 30 seconds; 72 °C, 
110 seconds). The PCR products were Ncol/Hindlll digested and ligated into pCKX-GFP. 
pCKX-GFP is pBAD24, wherein the Clal, Ncol Arabinose regulatory unit cassette was 

25 replaced by a variant of the lux autoinducer system of Vibrio fischeri from pJGR (Devine et 
al., Biochim. Biophys. Acta 1 173:121-132 (1993)). The ligation was transformed into XL1- 
Blue. The libraries were plated on LB-Amp200 and grown ON at 37°C. The colonies were 
picked into six 384 well plates and grown overnight. The cultures were grldded onto 
nitrocellulose and the colonies were grown overnight (ON) at 30 °C. The plate was 

30 incubated for 45 min. at 60 °C. Then the nitrocellulose filter was placed onto a blotting paper 
containing 100 mM Na-citrate pH 5 containing 0.2% Triton X-100 and 1 mM D-Luciferin. This 
was placed onto plastic wrap with the nitrocellulose and colonies facing down. This 
assembly was placed on a BIOMAX MR in a film cassette for 30 min. After development, 
the film was scored by eye, the brightest clones were inoculated from the 384 well plates, 

35 and these clones were grown ON at 30°C in 75 pi LB-Amp in 96 well formal The lueiferase 
was extracted from these cultures as follows. A culture volume of 20 pi was mixed with 20 pi 
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lysis buffer I (100 mM Tris-CL pH 7.8, 5% Triton X-100. 10 mM DTT, 10 mM 
EDTA. 2 mg/ml Polymyxin B sulfate). After shaking, the reaction mixture 
was frozen for 1 hr. at -70°C and thawed after that at room temperature. 60 |il 
of lysis buffer H (100 mM Tris-Cl pH 7.8, 0.25U/nl DNAsel. 1.5 mg/ml hen egg 
lysozyme. 40 mM MgS04) were added and the lysis mixture was incubated 
for 30 min at room temperature. Aliquots of the lysates were incubated for 30 
min at various temperatures between 30°C and 42°C. In addition, aliquots 
were left at RT for several days. The luciferase activity of 5 \i\ of the standard 
lysate and the heat treated lysates were measured using 50 jil complete assay 
buffer (20 mM Tris-Cl pH 7.8, 5 mM MgS04, 0.5 mM ATP, 0.5 mM Coenzyme 
A, 0.5 mM D-Luciferin, 5 mM DTT) in aTopcount luminometer. Several 
clones showed an increase in residual activity after heat treatment and when 
left at RT for several days. One clone showed an increase in luciferase 
activity of 5-fold over Luciola mingrelica wildtype clone in E.coli extracts 
when treated for 30 minutes at 39°C. After 4 days incubation at RT, the same 
clone showed ten-fold more activity than wild-type L Mingrelica luciferase 
that had been treated identically. In addition, this clone showed a significant 
increase (2-fold) in activity over wild-type when grown at 37°C. 

These results demonstrate the evolution of a luciferase with improved 
stability relative to parental donoT molecular substrates. 

Although the foregoing invention has been described in some detail by 
way of illustration and example for purposes of clarity of understanding, it 
will be obvious that certain changes and modifications may be practiced 
within the scope of the appended claims. 

All references cited herein are expressly incorporated in their entirety 
for all purposes. 

Any discussion of documents, acts, materials, devices, articles or the 
like which has been included in the present specification is solely for the 
purpose of providing a context for the present invention. It is not to be taken 
as an admission that any or all of these matters form part of the prior art base 
or were common general knowledge in the field relevant to the present 
invention as it existed in Australia before the priority date of each claim of 
this application. 



80 



THE CLAIMS DEFINING THE INVENTION ARE AS FOLLOWS: 

1. A method for producing a recombinant DNA encoding a protein, the 

method comprising: 

(a) digesting at least a first and second DNA substrate molecule, 
wherein the at least first and second substrate molecules are homologous and 
differ from each other in at least one nucleotide, with a restriction 
endonuclease, wherein the at least first and second DNA substrate molecules 
each encode a protein, or are homologous to a protein-encoding DNA 
substrate molecule; 

(b) ligating the resulting mixture of DNA fragments to generate a library 
of recombinant DNA molecules, which library comprises a plurality of DNA 
molecules, each comprising a subsequence from the first nucleic acid and a 
subsequence from the second nucleic acid, wherein the plurality of DNA 
molecules are homologous; 

(c) screening or selecting the resulting products of (b) for a desired 

property; 

(d) recovering a recombinant DNA molecule encoding an evolved 
protein; and, 

(e) repeating steps a-d using the recombinant DNA molecule of step (d) 
as the first or second DNA substrate molecule of step (a), whereby a 
recombinant DNA encoding a protein is produced. 

2. The method of claim 1, wherein the restriction endonuclease generates 
non-palindromic ends at cleavage sites. 

3. The method of claim 1 or claim 2, wherein the substrate molecules 
have been engineered to contain at least one recognition site for a restriction 
endonuclease having non-palindromic ends at cleavage sites. 

4. The method according to any one of claims 1 to 3, wherein (a) - (d) are 
repeated at least once. 

5. The method according to any one of claims 1 to 4, wherein the DNA 
substrate molecule comprises a gene cluster. 



81 



6. The method according to any one of claims 1 to 5, wherein the evolved 
protein encoded by the recovered recombinant DNA molecule is alpha 
interferon. 

7. The method according to any one of claims 1 to 5, wherein at least one 
restriction endonuclease fragment from a DNA substrate molecule is isolated 
and subjected to mutagenesis to generate a library of mutant fragments. 

8. The method of claim 7, wherein the library of mutant fragments is used 
in the ligation of (b). 

9. The method of claim 8, wherein the DNA substrate molecule encodes 
all or part of a protein selected from Table 1 or alpha interferon. 

10. The method according to any one of claims 7 to 9, wherein 
mutagenesis comprises recursive sequence recombination. 

11. The method according to any one of claims 1 to 10, wherein the 
products of (d) are subjected to mutagenesis. 

12. The method of claim 11, wherein mutagenesis comprises recursive 
sequence recombination. 

13: The method of claim 11 or claim 12, wherein the products of claim 11 
or claim 12 are used in (d). 

14. The method according to any one of claims 1 to 13, wherein the 
products of (d) are used as a DNA substrate molecule in (b). 

15. The method according to any one of claims 1 to 14, wherein the 
recombinant DNA substrate molecule of (d) comprises a library of 
recombinant DNA substrate molecules. 

16. The method according to any one of claims 1 to 15, wherein the library 
of step (b) comprises lO'-lO* different members. 
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17. The method according to any one of claims 1 to 16. further comprising 
expressing the evolved protein in vitro or in a cell. 

18. An evolved protein produced by the method of claim 17. 

19. A recombinant DNA molecule produced by the method according to 
any one of claims 1 to 16. 

20. A method for evolving a protein encoded by a recombinant DNA 
substrate molecule by recombining at least a first and second DNA substrate 
molecule, the method comprising: 

(a) providing at least first and second substrate molecules which differ 
from each other in at least one nucleotide and which comprise defined 
segments, the first and second substrate molecule each encoding a protein, or 
being homologous to a protein-coding DNA. and providing a set of 
oligonucleotide PCR primers, the set of PCR primers comprising a plurality 
of primers, each of the plurality of PCR primers comprising a first 
subsequence which is complementary to a first segment from the first 
substrate molecule and a second.subsequence which is complementary to a 
second segment from the second substrate nucleic acid, wherein the first 
segment from the first substrate molecule comprises at least one nucleotide 
difference as compared to the second segment; 

(b) amplifying the segments of the at least a first and second DNA 
substrate molecules with the primers.of step (a) in a polymerase chain 
reaction; 

(c) assembling the products of step (b) to generate a library of 
recombinant DNA substrate molecules; 

(d) screening or selecting the products of (c) for a desired property; and 

(e) recovering a recombinant DNA substrate molecule from (d) thereby 
providing a recombinant DNA substrate molecule encoding an evolved 
protein; and, 

(f) expressing the evolved protein, thereby producing the evolved 
protein. 

21. The method of claim 20. wherein the at least a first and second DNA 
substrate molecules are subjected to mutagenesis prior to step (a). 
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22. The method of claim 21, wherein the mutagenesis comprises recursive 
sequence recombination. 

23. The method according to any one of claims 20 to 22, wherein the at 
least a first and second DNA substrate molecules comprise alleles of a gene. 

24. The method according to any one of claims 20 to 22, wherein the at 
least a first and second DNA substrate molecules comprise a library of 
mutants. 

25. The method according to any one of claims 20 to 22. wherein the at 
least a first and second DNA substrate molecule comprises a gene cluster. 

26. The method according to any one of claims 20 to 22, wherein the at 
least first and second DNA substrate molecule encodes all or part of a DNA 
polymerase or alpha interferon. 

27. The method according to any one of claims 20 to 26, wherein the 
segments are defined by sites within intergenic regions. 

28. The method according to any one of claims 20 to 27, wherein the 
segments are denned by sites within introns. 

29. The method according to any one of claims 20 to 28, wherein the 
primers comprise a uracil substitution at one or more thymidine residues. 

30. The method of claim 29. wherein the products of (b) are treated with 
uracil glycosylase. 

31. The method according to any one of claims 20 to 30, wherein (a) - (e) 
are repeated. 



32. The method according to any one of claims 20 to 31, wherein at lea 
one PCR primer differs from the at least a first and second DNA substrate 
molecules in at least one nucleotide. 
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33. The method of claim 32, wherein the PCR primer comprises a 
nucleotide sequence of a known mutant or polymorphism of the at least a 
first or second DNA substrate molecule. 

34. The method of claim 33, wherein the PCR primer is degenerate and 
encodes the nucleotide sequence of more than one known mutant or 
polymorphism of the at least a first or second DNA substrate molecule. 

35 . The method of claim 34., wherein the at least a first and second DNA 
substrate molecule encodes all or part of a protein selected from Table 1 or 
alpha interferon. 

36. The method according to any one of claims 20 to 35, wherein the 
products of (e) are subjected to mutagenesis. 

37. The method of claim 36, wherein mutagenesis comprises recursive 
sequence recombination. 

38. The method of claim 36 or claim 37, wherein the products of claim 36 
or claim 37 are used in (b). 

39. The method according to any one of claims 20 to 38, wherein the 
products of (e) are used as a DNA substrate molecule in (b). 

40. The method according to any one of claims 20 to 39, wherein the 
recombinant DNA substrate molecule of (e) comprises a library of 
recombinant DNA substrate molecules. 

41. The method according to any one of claims 20 to 40, further 
comprising expressing the evolved protein in vitro or in a cell. 

42. An evolved protein produced by the method of claim 41. 
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43. The evolved protein or recombinant DNA substrate molecule encoding 
the evolved protein produced by the method according to any one of claims 
20 to 40. 
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5 44. A method for evolving a protein encoded by a recombinant DNA 

substrate molecule, by recombining at least a first and second DNA substrate 
molecule, the method comprising: 

(a) providing at least first and second substrate molecules, which first 
and second substrate molecules each encode a protein, or are homologous to 

10 a protein-coding DNA substrate molecule, which first and second substrate 
molecules share a region of sequence homology of about 10 to 100 base pairs 
and comprise defined segments and providing regions of homology in the at 
least a first and second DNA substrate molecules by inserting an intron 
sequence between at least two defined segments; 

15 (b) fragmenting and recombining DNA substrate molecules of (a), 

wherein regions of homology are provided by the introns; 

(c) screening or selecting the products of (b) for a desired property; and 

(d) recovering the recombinant DNA substrate molecule from the 
products of (c) thereby providing a recombinant DNA substrate molecule 

20 encoding an evolved protein; and 

(e) expressing the evolved protein, thereby producing the evolved 
protein. 
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45. The method of claim 44 wherein the introns are self-splicing. 

46. The method of claim 44 or claim 45, wherein the inserted introns 
comprise from 1 to 10 nonhomologous introns. 

47. The method according to any one of claims 44 to 46, wherein the 
30 intron comprises a recognition site for a restriction endonucleases having 

non-palindromic ends at cleavage sites. 

48. The method according to any one of claims 44 to 47, wherein (b) • (d) 
are repeated. 

35 
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49. The method according to any one of claims 44 to 48, wherein the DNA 
substrate molecule comprises a gene cluster. 

50. The method according to any one of claims 44 to 49, wherein at least 
one segment from a DNA substrate molecule is isolated and subjected to 
mutagenesis to generate a library of mutant fragments. 

51. The method of claim 50, wherein the library of mutant segments is 
used in the recombination of (b). 

52. The method according to any one of claims 44 to 51, wherein the 
segments are defined by exons. 

53. The method according to any one of claims 44 to 51, wherein the 
15 segments are defined by intergenic regions. 

54. The method according to any one of claims 44 to 53, wherein the at 
least a first and second DNA substrate molecules encode protein homologues. 

20 55. The method according to any one of claims 44 to 54, wherein the 
intron contains a lox site, and wherein the products of (b) are used to 
transfect a Cre* host. 

56. The method according to any one of claims 44 to 55, wherein the at 
25 least a first and second DNA substrate molecule encodes all or part of a 

protein selected from Table 1 or alpha interferon. 

57. The method according to any one of claims 44 to 56, wherein the at 
least a first and second DNA substrate molecule are subjected to mutagenesis 

30 prior to step (a). 

58. The method according to any one of claims 44 to 57, wherein the 
products of (d) are subjected to mutagenesis. 
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59. The method of claim 57 or claim 58, wherein the mutagenesis 
comprises recursive sequence recombination. 
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60. The method according to any one of claims 44 to 59, wherein the 
products of (d) are used as a DNA substrate molecule in (b). 

61. The method according to any one of claims 44 to 60, wherein the 
recombinant DNA substrate molecules of (d) comprises a library of 
recombinant DNA substrate molecules. 

62. The method according to any one of claims 44 to 61, further 
comprising expressing the evolved protein in vitro or in a cell. 

63. An evolved protein produced by the method of claim 62. 

64. The evolved protein or recombinant DNA substrate molecule encoding 
the evolved protein produced by the method according to any one of claims 
44 to 61. 

65. A method for evolving a protein encoded by a DNA substrate molecule 
the method comprising: 

(a) providing a set of oligonucleotide PCR primers, for amplification 
and recombination of at least a first and second DNA substrate molecule, 
wherein the at least a first and second substrate molecules differ from each 
other in at least one nucleotide and comprise defined segments and wherein 
for each junction of segments a pair of primers is provided, one member of 
each pair bridging the junction at one end of a segment and the other bridging 
the junction at the other end of the segment, with the terminal ends of the 
DNA molecule having as one member of the pair a generic primer, and 
wherein a set of primers is provided for each of the at least a first and second 
substrate molecules; 

(b) amplifying the segments of the at least a first and second DNA 
substrate molecules with the primers of (a) in a polymerase chain reaction; 

(c) assembling the products of (b) to generate a pool of recombinant 
DNA molecules; 

(d) selecting or screening the products of (c) for a desired property; 

(e) recovering a recombinant DNA substrate molecule from the 
products of (d) encoding an evolved protein; and 



88 



(f) expressing the evolved protein, thereby producing the evolved 
protein. 

66. The method of claim 65. wherein (a) - (e) are repeated. 

67. The method of claim 65 or claim 66, wherein the at least a first and 
second DNA substrate molecule are subjected to mutagenesis prior to (a). 

68. The method of claim 67, wherein the mutagenesis comprises recursive 
sequence recombination. 

69. The method according to any one of claims 65 to 68, wherein the at 
least a first and second DNA substrate molecule comprise sequences 
encoding protein homologues. 

70. The method according to any one of claims 65 to 69, wherein the 
primers comprise a uracil substitution at one or more thymidine residues. 

71. The method of claim 70, wherein the products of (b) are treated with 
uracil glycosylase. 

72. The method according to any one of claims 65 to 71, wherein the at 
least a first and second DNA substrate molecule encodes all or part of a 
protein selected from Table 1 or alpha interferon. 

73. The method according to any one of claims 65 to 71. wherein the at 
least a first and second DNA substrate molecule comprises a gene cluster. 

74. The method according to any one of claims 65 to 73, wherein at least 
one PCR primer differs from the at least a first and second substrate 
molecules in at least one nucleotide. 

75. The method of claim 74. wherein the PCR primer comprises a 
nucleotide sequence of a known mutant or polymorphism of the at least a 
first or second substrate molecule. 
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76. The method of claim 75 , wherein the PCR primer is degenerate and 
encodes the nucleotide sequence of more than one known mutant or 
polymorphism of the at least a first or second substrate molecule. 

5 77. The method according to any one of claims 65 to 76, wherein the 
products of (e) are subjected to mutagenesis. 

78. The method of claim 77, wherein mutagenesis comprises recursive 
sequence recombination. 

10 

79. The method according to any one of.claims 65 to 78, wherein the 
produces of (e) are used as a DNA substrate molecule in (b). 

80. The method according to any one of claims 65 to 79, wherein the 
15 recombinant DNA substrate molecule of (e) comprises a library of 

recombinant DNA substrate molecules. 

81. The evolved protein or recombinant DNA substrate molecule encoding 
the evolved protein produced by the method according to any one of claims 

20 65 to 80. 

82. A method for recombining at least a first and second DNA substrate 
molecule, comprising: 

(a) transfecting a host cell with at least a first and second DNA 
substrate molecule wherein the at least a Brst and second DNA substrate 
molecules are recombined in the host cell; 

(b) screening or selecting the products of (a) for a desired property; 

(c) recovering recombinant DNA substrate molecules from (b); and 

(d) repeating steps (a) -(c). 

83. The method of claim 82, wherein the products of (c) are subjected to 
mutagenesis. 

84. The method of claim 83, wherein the mutagenesis comprises recursive 
35 sequence recombination. 
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85. The method according to any one of claims 82 to 84. wherein (a) - (d) 
are repeated. 

86. The method according to any one of claims 82 to 85 wherein the 
products according to any one of claims 82 to 85 are used in (a). 

87. A method of performing oligonucleotide mediated recombination, the 
method comprising: 

providing a first and a second nucleic acid; 

selecting segments in the first and second nucleic acid; 

providing a plurality of bridge oligonucleotides, which bridge 
oligonucleotides each comprise at least a first subsequence which is 
complementary to at least one segment in the first nucleic acid and at least a 
second subsequence which is complementary to the second nucleic acid; 

extending the plurality of bridge oligonucleotides with a polymerase, 
using the first and second nucleic acids, or subsequences of the first and 
second nucleic acids, as templates, thereby producing a plurality of 
recombinant nucleic acid segments; and, 

providing a plurality of recombinant nucleic acids, each comprising 
one or more subsequence comprising one or more of the recombinant nucleic 
acid segments. 

88. The method of claim 87, wherein the plurality of recombinant nucleic 
acids are produced by assembly PCR performed using a pool of nucleic acids 
comprising the recombinant nucleic acid segments. 

89. The method of claim 87 or claim 88, wherein the plurality of bridge 
oligonucleotides are extended by PCR. 

90 The method according to any one of claims 87 to 89, further 
comprising recombining at least one of the plurality of recombinant nucleic 
acids with at least one additional nucleic acid to produce a further 
recombined nucleic acid. 

91. The method of claim 90, further comprising selecting the further 
recombined nucleic acid for a desired property. 
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92. A method for making a modified or recombinant nucleic acid, the 
method comprising: 

(a) providing a single-stranded template nucleic acid; 

(b) providing a population of nucleic acid fragments, the nucleic acid 
fragments being produced by fragmentation of an at least first 
nucleic acid substrate molecule or at least first nucleic acid 
substrate molecules, said at least first nucleic acid substrate 
molecule or molecules being homologous to the template nucleic 
acid, and differing from the template nucleic acid in at least one 
nucleotide; 

(c) contacting the single-stranded template nucleic acid with the 
population of nucleic acid fragments., thereby producing at least 
one annealed nucleic acid product; and, 

(d) contacting the products of (c) with a ligase, thereby producing a 
modified or recombinant nucleic acid. 

93. The method of claim 92, wherein (d) further comprises contacting the 
at least one annealed nucleic acid product of (c) with a polymerase. 

94. The method of claim 92 or claim 93, wherein the nucleic acid 
fragments of (b) are produced by fragmentation of the at least first nucleic 
acid substrate molecules using enzymatic digestion, DNase digestion. RNase 
digestion, restriction enzyme digestion, sonication, or random shearing. 

95. A method for making a modified or recombinant nucleic acid, the 
method comprising: 

(a) providing a selected single-stranded template nucleic acid; 

(b) contacting the selected single-stranded template nucleic acid with a 

population of nucleic acids or nucleic acid fragments, wherein the 
population of nucleic acids or nucleic acid fragments comprises 
one or more of: 

(i) nucleic acids or nucleic acid fragments which comprise nucleic 
acid sequences which are homologous to the single-stranded 
template nucleic acid; 
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(C) 



(ii) nucleic acids or nucleic acid fragments resulting from digestion of 
at least first substrate molecules with a DNase, 

(iii) nucleic acids or nucleic acid fragments which comprise nucleic 
acid sequences produced by mutagenesis of a parental nucleic 
acid, 

(iv) nucleic acids or nucleic acid fragments resulting from digestion of 
at least first substrate molecules with a restriction enzyme, 

(v) nucleic acids or nucleic acid fragments comprising at least one 
nucleic acid sequence which is homologous to the single-stranded 
template nucleic acid, which sequence is present in the 
population at a concentration of less than 1% by weight of the 
total population of nucleic acids or nucleic acid fragments, 

(vi) nucleic acids or nucleic acid fragments comprising at least one 
hundred nucleic acid sequences which are homologous to the 
template, or 

(vii) nucleic acids or nucleic acid fragments comprising sequences of 
at least 50 nucleotides, 

thereby producing an annealed nucleic acid product; and 
contacting the annealed nucleic acid with a polymerase or a ligase. 
.thereby producing a modified or recombinant nucleic acid strand. 



96. The method of claim 95, wherein the nucleic acid molecule of (b)(i) are 
produced by chemical synthesis. 

97. The method of any one of claims 92 to 95, wherein said at least first 
nucleic acid substrate molecules include at least two substrate molecules, 
which at least two substrate molecules are natural variants of one another, or 
which at least two substrate molecules are allelic or species variants of one 
another. 

98. The method of any one of claims 92 to 95. wherein said at least first 
nucleic acid substrate molecules are induced variants of one another. 

99. The method of any one of claims 92 to 98, wherein at least one of said 
at least first nucleic acid substrate molecules or said single stranded template 
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nucleic acid comprises one or more of : a genomic nucleic acid, a cDNA, a 
vector, or a phagemid. 

100. The method of any one of claims 92 to 99, wherein the population of 
5 nucleic acid fragments comprises single-stranded nucleic acids. 

101. The method of any one of claims 92 to 99, wherein the population of 
nucleic acid fragments comprises double-stranded nucleic acids. 

10 102. The method of claim 101, wherein the population of double-stranded 
nucleic acid fragments are denatured prior to contacting the single-stranded 
template nucleic acid. 

103. The method of any one of claims 92 to 102, wherein the template 

15 nucleic acid comprises uracil and the method further comprises degrading 
the template nucleic acid. 

104. The method of any one of claims 92 to 103, wherein the template 
nucleic acid comprises uracil and the method further comprises degrading 

20 the template nucleic acid and releasing the resulting cleaved template nucleic 
acid from the annealed nucleic acid. 

105. The method of claim 104, wherein the template nucleic acid is 
degraded in vitro or in vivo. 

25 

106. The method of claim 104, said method further comprising synthesizing 
a nucleic acid strand which is complementary to the modified or recombinant 
nucleic acid strand. 

30 107. The method of any one of claims 92 to 106, said method further 

comprising: selecting or screening the modified or recombinant nucleic acid, 
or an encoded product thereof, for a desired property. 

108. The method of claim 107, said method further comprising: 
35 recovering or isolating the modified nucleic acid or encoded product thereof 
having the desired property. 
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109. The method of claim 108, said method further comprising: 
using the modified nucleic acid or encoded product thereof having the 
desired property of as one or more of: the at least first nucleic acid substrate 
molecule or molecules, or the template. 

110. The method of any one of claims 92 to 109, said method further 
comprising: transforming the recombinant or modified nucleic acid into a 
host. 

111. The method of claim 110, wherein the host is a mutS host. 

112. The method of any one of claims 92 to 111. wherein said at least one 
annealed nucleic acid product comprises at least one nucleic acid fragment 
annealed to the single-stranded template nucleic acid, wherein said method 
further comprises extending said at least one nucleic acid fragment using a 
polymerase. 

113. The method of any one of claims 92 to 112, wherein said at least one 
annealed nucleic acid product comprises at least two nucleic acid fragments 
annealed to the single-stranded template nucleic acid, wherein a sequence ol 
at least one nucleic acid fragment overlaps with a sequence of at least a 
second nucleic acid fragment 

114. The method of any one of claims 92 to 113, wherein said at least one 
annealed nucleic acid product is contacted with a nuclease. 

115. The method of any one of claims 92 to 114, wherein the single- 
stranded template nucleic acid comprises one or more of: a vector, a 
phagemid, an RNA and a DNA. 

116. The method of any one of claims 92 to 115, wherein the nucleic acid 
population comprises one or more of: 100 or more members, 500 or more 
members, or 1000 or more members. 
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117. The method of any one of claims 92 to 115, wherein at least one 
substrate nucleic acid molecules comprise at least a first substrate molecule 
homologous to the template nucleic acid, which at least first substrate 
molecule differs from the template nucleic acid in at least one nucleotide, and 
is present in the population of nucleic acid molecules at a concentration of 
less than 1% by weight of the total population of nucleic acid molecules. 

118. The method of any one of claims 92 to 117, wherein the at least one 
substrate nucleic acid molecules comprise members which are at least 50 
nucleotides in length. 

119. The method of any one of claims 92 to 118, the method further 
comprising mutating or shuffling the modified or recombinant nucleic acid. 

120. The method of any one of claims 92 to 119, wherein at least one 
nucleic acid molecule of the population of nucleic acid molecules comprises 
a sequence produced by mutagenesis of a nucleic acid corresponding to the 
template nucleic acid. 

121. A modified or recombinant nucleic acid produced by the method of 
any of claims 92 to 120. 

122. A polypeptide encoded by the modified or recombinant nucleic acid of 
claim 121. 

123. A library of modified or recombinant nucleic acids produced by the 
method of any claims 92 to 120. 

124. A method for optimizing expression of a protein by evolving the 
protein, wherein the protein is encoded by a DNA substrate molecule, 
comprising: 

(a) providing a set of oligonucleotides, wherein each oligonucleotide 
comprises at least two regions complementary to the DNA molecule and at 
least one degenerate region, each degenerate region encoding a region of an 
amino acid sequence of the protein; 
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(b) assembling the set of oligonucleotides into a library of full length 

genes; 

(c) expressing the products of (b) in a host cell; 

(d) screening the products of (c) for improved expression of the 
protein; 

(e) recovering a recombinant DNA substrate molecule encoding an 

evolved protein form (d), and 

(f) expressing the evolved protein, thereby producing the evolved 

protein. 

125. The method of claim 124, wherein the primers comprise about 20 
nucleotides complementary to the DNA substrate molecule followed by a 
second region of about 20 degenerate nucleotides of homology with the DNA 
substrate molecules followed by about 20 nucleotides complementary to the 
DNA substrate. 

126. The method of claim 124 or claim 125, wherein the protein is bovine 
intestinal alkaline phosphatase. 

127. The method of claim 126, wherein the oligonucleotides comprise one 
or more primers from Table IL 

128. The method of claim 124 or claim 125, wherein the DNA substrate 
molecule encodes all or part of a protein selected from Table I or alpha 
interferon. 

129. The method of claim 124, wherein the DNA molecule comprises a gene 
cluster. 

130. The method according to any one of claims 124 to 129. wherein (a) - (e) 
are repeated at least once. 

131. The method according to any one of claims 124 to 130, wherein the 
oligonucleotides comprise at least 5* and 3' nucleotide complementary to the 
DNA substrate molecule and 20-300 nucleotides having up to about 85% 
sequence homology with a region of the DNA substrate molecule. 
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132. The method of claim 131, wherein the oligonucleotides comprise a set 
of oligonucleotides in which each oligonucleotide overlaps with a second 
oligonucleotide. 

5 

133. The method according to any one of claims 124 to 132. wherein the 
products of (e) are subjected to mutagenesis. 

134. The method of claim 133, wherein mutagenesis comprises recursive 
10 sequence recombination. 

135. The method according to any one of claims 124 to 134, wherein the 
recombinant DNA substrate molecule of (e) comprises a library of 
recombinant DNA substrate molecules. 

15 

136. The evolved protein or recombinant DNA substrate molecule encoding 
the evolved protein produced by the method according to any one of claims 
124 to 135. 

20 137. A method for optimizing secretion of a protein in a host by evolving a 
gene encoding a secretory function, comprising: 

(a) providing a cluster of genes encoding secretory functions; 

(b) . recombining at least a first and second sequence in the gene 
cluster of (a) encoding a secretory function, the at least a first and second 
sequences differing from each other in at least one nucleotide, to generate a 
library of recombinant sequences; 

(c) transforming a host cell culture with the products of (b). wherein 
the host cell comprises a DNA sequence encoding the protein; 

(d) subjecting the product of (c) to screening or selection for secretion 

of the protein; and 

(e) recovering DNA encoding an evolved protein comprising a 

secretory function from the product of (d). 

138. The method of claim 137, wherein the gene cluster comprises at least 
one recognition site for a restriction endonuclease having nonpalindromic 
ends at the cleavage site. 
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139. The method of claim 137 or claim 138, wherein the host is E. coli., 
yeast, Bacillus, Pseudomonas, or a mammalian cell. 

5 140. The method according to any one of claims 137 to 139. wherein the 
protein is a thermostable DNA polymerase. 

141. The method according to any one of claims 137 to 139, wherein the 
DNA sequence of (c) encodes all or part of a protein selected from Table I or 

10 alpha interferon. 

142. The method according to any one of claims 137 to 139, wherein the 
DNA sequence of (c) comprises a library of mutant sequences. 

15 143. The method according to any one of claims 137 to 142, wherein the 
products of (e) are subjected to mutagenesis. 

144. The method of claim 143, wherein mutagenesis comprises recursive 
sequence recombination. 

20 145. The method of claim 143 or claim 144, wherein the products of claim 
143 or claim 144 are used in (a). 

146. The method according to any one of claims 137 to 145, wherein the 
25 protein is inducibly expressed. 

147. The method according to any one of claims 137 to 146. wherein the 
protein is linked to a secretory leader sequence. 

30 148. The method according to any one of claims 137 to 147. wherein (a) - (a) 
are repeated at least once. 

149. The method according to any one of claims 137 to 148. wherein the 
DNA of (e) comprises a library of evolved genes. 
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150. A secretory gene evolved by the method according to any one of clams 
137 to 149. 

Dated this twenty-second day of January 2001 

Maxygen, Inc. 

Patent Attorneys for the Applicant 
FBRICE&CO 
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SEQUENCE LISTING 



<110> Patten, Phillip 
Steamer, Willera 

<120> METHODS AND COMPOSITIONS FOR POLYPEPTIDE ENGINEERING 



<130> 02-205-0 

<140> 08/769.062 
<141> 1996-12-18 

<150> 08/198.431 
<151> 1994-02-17 

<150> 08/425,684 
<15l> 1995-04-18 

<150> 08/537,874 
<151> 1995-10-30 



<160> 98 

<170> Patentln Ver. 2.0 



<210> 1 
<211> 50 
<212> DNA 

<213> Artificial Sequence 

<"5> Description of Artificial Sequence: degenerate 
oligonucleotide used for codon usage library 

aaccctccag ttccgaaccc catatgatga tcaccctgcg taaactgccg 

<210> 2 
<211> 38 
<212> DNA 

<213> Artificial Sequence 

<223> Description of Artificial Sequence: degenerate 
■ - oligonucleotide used for codon usage library 

<400> 2 

aaccctccag ttccgaaccc catatgaaaa aaaccgct 



<210> 3 
<21l> 40 
<212> DNA 

<213> Artificial Sequence 



<22°3> Description of Artificial Sequence: generate 
oligonucleotide used for codon usage library 

aaccctccag ttccgaaccc atatacatat gcgtgctaaa 
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<210> 4 
<2U> 44 
<212> ONA 

<213> Artificial Sequence 



<IIV> Description of Artificial Sequence: generate 
oligonucleotide used for codon usage library 

aaccctccag ttccgaaccc catatgaaat acccgccgcc gacc 

<21C> 5 
<211> 40 
<212> ONA 

<213> Artificial Sequence 

2IS> Description of Artificial Sequence: ^ ne " te 
<2 " eligonScieotide used for codon usage library 

/inn> 5 40 
aaccctccag ttccgaaccc gatatacata tgaaacagtc 

<210> 6 
<211> 60 
<212> ONA 

<213> Artificial Sequence 

<220> • at Artificial Sequence: degenerate 

SSSfiitJ. ^ lor ~Z »..,. "b«ry 

t«,=ce.„ cd.t„cd, t «o, Ct cc„ tt,a.,a,=. « 

<210> 1 
<211> 60 
<212> DUA 

<213> Artificial Sequence 

<210> 8 
<211> 60 
<212> DNA 

<213> Artificial Sequence 

<220> . • ^ a^ificial Sequence: degenerate 

:rc4.« c y «„=«= ***** — - - 

<210> 9 



<2il> 60 
<212> DNA 

<213> Artificial Sequence 



££4,=t ,ccc.,cc„ cd,-.„cd. t ««« ««<--«« «~-~ 60 



<21C> 10 
<211> 61 
<212> ONA 

<213> Artificial Sequence 



'Al^l Description of Artificial Sequence: degenerate 
<2 " otigonScJeotide used for codon usage library 

^gccgcSgct gttcaccccg gtdacyaarg cdgcdcargt dctggtcccg gttgaagagg 60 



<210> 11 
<211> 60 
<212> DNA 

<213> Artificial Sequence 



<220> -+i nn of Artificial Sequence: degenerate 

«»-«— E " d "« iC ,tt,ot ct,CMCC 80 



<210> 12 
<211> 60 
<212> DNA 

<213> Artificial Sequence 



<220> • «f Artificial Sequence: degenerate 

"r,°««c« ,«c«„,t mm «*— « * cec '" tce 



60 



<210> 13 
<211> 60 
<212> DNA 

<213> Artificial Sequence 



<220> « of Artificial Sequence: degenerate 

<223> SSSESt^iS Sr cod£ usage library 

^gggtc cggaaacccc dccggcdatg gaycarttyc cgtacgt.gc tctgtctaaa 60 



<210> 14 
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<21l> 60 
<212> DNA 

<2i3> Artificial Sequence 



<400> 14 M.^f dravct— «c ggtgttaaag gcaactaccg 60 

ggttccggac tcrgctggta cygcdacygc dtayct„.gc gg y 

<210> 15 
<211> 60 
<212> DNA 

<213> Artificial Sequence 

lllll Description of Artificial Sequence: degenerate 
2 " oligonucleotide used for codon usage library 

cigcUtta caaccagtgc aaracyacyc gyggyaayga agttacctct gttatgaacc 60 

<210> 16 
<211> 60 
<212> DNA 

<213> Artificial Sequence 

<22°3> Description of Artificial Sequence: degenerate 
oligonucleotide used for codon usage library 

cc^tggtg « 9 «accac yacycgygtd carcaygcdt ctccggctgg tgcttacgct 60 

<2io> n 

<211> 60 
<212> DNA 

<213> Artificial Sequence 

<220> , „„„ of Artificial Sequence: degenerate 

<223> SS2St°J. A S5^« ~£ usa 3 e library 

^taftctgac gctgacctgc cdgcdgaygc dcaratoaac gg«occagg acatcgctgc 60 

<210> 18 
<211> 60 
<212> DNA 

<213> Artificial Sequence 

<22°3> Description of Artificial Sequence: <^«™" te 
oligonucleotide used for codon usage library 

acaJcgacgt tatcctgggt ggygqycgya artayatgtt cccggctgat accccggacc 60 

<210> 19 
<211> 60 
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<212> DMA 

<213> Artificial Sequence 



<220> - . „* Artificial Sequence: degenerate 

<" 3> SSSSStS-^— *» u **" u,>r " , 

tZUllc , W cc,». rc,„arc.r ..ycc„ t dc .„=«„ea „«...<:.= «» 



<210> 20 
<211> 60 
<212> DNA 

<213> Artificial Sequence 



<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for codon usage Horary 

gaaccgtacc gctctgctgc arg.dgcdga ygaytcytct gttacccacc tgatgggtct 60 



<210> 21 
<211> 60 
<212> DNA 

<213> Artificial Sequence 



lllll Description of Artificial Sequence: degenerate 
2 oligonucleotide used for codon usage Ubrary 

aaJacaacgt tcagcaggac cayacyaarg ayccdacyct gcaggaaatg accgaagttg 60 



<210> 22 
<211> 60 
<212> DNA 

<213> Artificial Sequence 



?£V> Description of Artificial Sequence: degenerate 
2 oligonucleotide used for codon usage Ubrary 

. jacccgcgtg gtttctacct gttygtdgar ggygqycgya tcgaccacgg tcaccacgac 60 



<210> 23 
<211> 60 
<212> DNA 

<213> Artificial. Sequence 



SSJ Description of Artificial Sequence: degenerate 
oligonucleotide used for codon usage library 

gaccgaagct ggtatgttcg ayaaygedat ygedaaroct aacgaactga cctctgaact 60 



<210> 24 
<211> 60 
<212> DNA 
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<213> Artificla Sequence 

<HV> Description of Artificial Sequence: degenerate 
<2 " S^nucieStide used for codon usage library 

ccgtgfcca ctcccacgtt ttytcytty, gyggytayac cctgcgtggt acctctatcc 60 

<210> 25 . . 

<21I> 60 
<212> DNA 

<213> Artificial Sequence 

<22?> Description of Artificial S^uence: degenerate 
oligonucleotide used for codon usage library 

gctcJggact ctaaatctta yacytcyaty ctgtayggya acggtccggg ttacgccctg 60 

<210> 26 
<211> 60' 
<21Z> ONA 

<213> Artificial Sequence 

: 2 22°3> Description of Artificial Sequence: degenerate 
oligonucleotide used for codon usage library 

cJtSaacgac tctacctctg argayccdtc ytaycarcag caggctcctg ttccgcaggc 60 

<210> 27 
<211> 60 
<212> DMA 

<213> Artificial Sequence 

Description of Artificial Sequence:- degenerate 
oligonucleotide used for codon usage library 

aagacgtU tgttttcgct cgyggyccdc argcdcayct ggttcacggt gttgaagaag 60 

~~<210> 28 
<2ll> 60 
<212> DNA 

<213> Artificial Sequence 

<22°3> Description of Artificial Sequence : degenerate 
oligonucleotide used for codon usage library 

a^t'tcg ctggttgcgt dgarccdtay acygaytgya acct 9 ccggc tccgaccacc 60 

<210> 29 
<211> 61 
<212> DNA 

<213> Artificial Sequence 
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<2»> Description of Artificial Sequence: generate 
oligonucleotide used for codon usage library 

^cacctg getgctt»ac cdccdccdct ggcdctgccg gctggcgcta tgctgctcct^O 
c 

<2i0> 30 
<2U> 62 
<2i2> DNA 

<213> Artificial Sequence 

<22> Description of Artificial Sequence: degenerate 
oligonucleotide used for codon usage libra-y 

«ccgcc?ct agagaattct tartacagrg thgghgccag gaggagcagc atagc.ccag 60 



<210> 31 
<211> 58 
<212> DNA 

<213> Artificial Sequence 



<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for codon usage library 

aa^cagccag gtgagcagcg tchggratrg argthgcggt ggtcggagcc ggcaggtt 58 

<210> 32 
<211> 60 
<212> DNA 

<213> Artificial Sequence 

<22?> Description of Artificial Sequence: (tagmnt. 
oligonucleotide used for codon usage library 

<400> 32 tccttcaaca ccgtgaacca 60 

.^gcaaccagc gaaagccatg atrtghgcha craargtytc tcctcca 

<210> 33 
<211> 60 
<212> DNA 

<213> Artificial Sequence 

<22°3> Description of Artificial Sequence: *«•»•"*• 
oligonucleotide used for codon usage library 

gcgaaaacag caacgtcttc rccrccrtgr gtytcrgahg cctgcggaac agcagcctgc 60 

<210> 34 
<211> 60 
<212> DNA 



<213> Artificial • equence 



Description of Artificial Sequence: degenerate 
" S^nScIeotide used for codon usage library 

SStil ccattaacot chggrcgrga rccrccrccc ag-gcgtaac ccggaccgtt 60 

<210> 35 
<211> 60 
<212> DNA 

<213> Artificial Sequence 

<2H> Description of Artificial Sequence: <^ n «" te 
oligonucleotide used for codon usage library 

aagatttaga gtccagagct ttrgahgghg ccagrccraa gatagaggta ccacgcaggg 60 

<210> 36 
<211> 60 
<212> DNA 

<213> Artificial Sequence 

<223> Description of Artificial Sequence: degenerate 
" oligonucleotide used for codon usage library 

acgJgagagt ggtcagcggt haccagratc agrgtrtcca gttcagaggt cagttcgtta 60 

<210> 37 
<211> 60 
<212> DNA 

<213> Artificial Sequence 

%\V> Description of Artificial Sequence: degenerate 
<223 otigonucleotide used for codon usage library 

gaacatacca gcttcggtca ghgccatrta fcgcyttrtcg tcgtggtgac cgtggtcgat 60 

~<210> 38 
<211> 60 
<212> DNA 

<213> Artificial Sequence 

<22°3> Description of Artificial Sequence: 

oligonucleotide used for codon usage library 

gg 0 t °a,aaacc acgcgggtta cgrga.acha crcgcagfcgc aacttcggtc atttcctgca 60 

<210> 39 
<211> 60 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for codon usage library 

<400> 39 

tcctgctgaa cgttgtattt catrtchgch ggytcraaca gacccatcag gtgggtaaca 

<210> 40 
<211> 60 
<212> DNA 

<2i3> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for codon usage library 

<400> 40 

cagcagagcg gtacggttcc ahacrtaytg hgcrccytgg cgcttagcct gccaagcctg 

<210> 41 
<211> 60 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for codon usage library 

<400> 41 

tacgaacacc gttaacagaa gcrtcrtchg grtaytchgg gtccggggca ccaaccggga 

<210> 42 
<2ll> - 60 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for codon usage library 

<400> 42 

cccaggataa cgtcgatgtc catrttrtth accagytghg cagcgatgtc ctggcaaccg 

<210> 43 
--<211> 60 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for codon usage library 

caggLagcg tcagagtacc arttrcgrtt hacrgtrtga gcgtaagcac cagccggaga 60 

<210> 44 
<211> 60 
<212> DNA 

<213> Artificial Sequence 



Description of Artificial Sequent: degenerate 
oligonucleotide used for codon usage library 

^aclac acc.acagat tcrcchgcyt tyxthgcreg gttcat.aca gaggtaactt 60 

<210> 45 

<2H> 60 
<2i2> DNA 

<213> Artificial Sequence 

<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for codon usage norary 

ca«gg«gt aacgagcagc hgcrgahacr ccratrgtrc ggtagttacc tttaacaccg 60 

<210> 46 
<211> 60 
<212> DNA 

<213> Artificial Sequence 

<22?> Description of Artificial Sequence: ^generate 
oligonucleotide used for codon usage library 

accagcagag tccggaacct grcgrtchac rttrtargtt ttagacagag caacgtacgg 60 

<210> 47 
<211> 60 
<212> DNA 

<213> Artificial Sequence 

<2«> Description of Artificial Sequence: ^generate 
oligonucleotide used for codon usage library 

gggtttccgg acccagttta ccrttcatyt grccyttcag gatacgggta gcggtaacgg 60 

<210> 48 
<211> 60 
""<212> DNA 

<213> Artificial Sequence 

<223> Description of Artificial Sequence: ^generate 
oligonucleotide used for codon usage library 

cccaggaaca ggataacgtt ytthgcKgcr gtytgratng qctgcagttt tttagcaacg 60 

<210> 49 
<2ll> 42 
<212> ONA 

<213> Artificial Sequence 



<220> 



<223> Descripti ot Artificial Sequence generate 
oligonucleotide used for codorr usage library 

<400> 49 42 
acggttccag aaagccgggt cttectcttc aaccggaacc ag 

<210> 50 
<211> 60 
<212> DMA 

<213> Artificial Sequence 

<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for codon usage library 

c«gagcaga cataacacca gchgchachg chachgccag cggcagttta cgcagggtga 60 

<210> 51 
<211> 62 
<212> DNA 

<213> Artificial Sequence 

<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for codon usage library 

accggggU acagcagcgg cagcaghgcc aghgcratrg trgactgttt catatgtata 60 
tc 

<210> 52 
<211> .59 
<212> ONA 

<213> Artificial Sequence 

<223> Description of Artificial Sequence: d^«" e " te 
oligonucleotide used for codon usage library 

gccggctgag cagccagcag cagcagrcch gchgchgcgg tcggcagcag gtagtttca 59 

<210> 53 
<211> 60 
-'<212> DNA 

<213> Artificial Sequence 

<22^ Description of Artificial Sequence : d ^^«" te 
oligonucleotide used for codon usage library 

aagagatagc gatcggggtg gtcaghacra trcccagcag tttagcacgc atatgtatat 60 

<210> 54 
<21l> 58 
<212> DNA 

<213> Artificial Sequence 



<220> 



<223> Descripta of Artificial Sequence: degenerate 
oligonucleotide used for codon usage library 

caacggtagc gaaaccagcc aghgchaehg crathgcrat agcggttett ttcatatg 

<210> 55 
<211> 39 
<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for codon usage library 

<400> 55 

agaattctct agaggcggaa actctccaac tcccaggtt 

<210> 56 
<2U> 39 
<212> DMA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for codon usage library 

<400> 56 

tgagaggttg agggtccaat tgggaggtca aggcttggg 

<210> 57 
<211> 18 
<212> DNA 

<213> Artificial Sequence 



<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for alpha interferon 
shuffling 

<400> 57 

tgtratctgy ctsagacc 

<210> 58 
<211> 23 
-~<212> DMA 
<213> Artificial Sequence 



<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for alpha interferon 
shuffling 



<400> 58 

ggcacaaatg vgraagaatct etc 

<210> 59 
<211> 22 
<212> DNA 

<213> Artificial Sequence 



<223> Description of Artificial, Sequence: degenerate 
oligonucleotide used for alpha interferon 
shuffling 

<400> 59 

agagattctk cbcatttgtg cc 

<210> 60 
<211> 24 
<212> DNA 

<213> Artificial Sequence 

<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for alpha interferon 
shuffling 

<400> 60 

cagttccaga agrctsmagc catc 

<210> 61 
<211> 24 
<212> DNA 

<213> Artificial Sequence 



<400> 61 

gatggctJcsa gycttctgga actg 

<210> 62 
<211> 19 
<212> DNA 

<213> Artificial Sequence 



<400> 62 
- " cttcaatctc ttcascaca 

<210> 63 
<21i> 19 
<212> DNA 

<213> Artificial Sequence 




<220> 

<223> Description of 
oligonucleotid 
shuffling 




<220> 

<223> Description of 
oligonucleotid 
shuffling 




<400> 63 

tgtgstgaag agattgaag 



<210> 64 



<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for alpha interferon 
shuffling 

<400> 64 

qgawsagass cccctaga 

<210> 65 
<2ll> 13 
<212> DNA 

<213> Artificial Sequence 

<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for alpha interferon 
shuffling 

<400> 65 

tctaggagss tctswtcc 

<210> 66 
<211> 21 
<212> DNA 

<213> Artificial Sequence 

<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for alpha interferon 
shuffling 

<400> 66 

gaacttdwcc agcaamtgaa t 

<210> 67 
<211> 21 
<212> DNA 

<213> Artificial Sequence 

<223> Description of Artificial Sequence:- degenerate 
oligonucleotide used for alpha interferon 
shuffling 

<400> 67 

attcakttgc tggwhaagtt c 

<210> 68 
<2ll> 19 
<212> DNA 

<213> Artificial Sequence 

<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for alpha interferon 
shuffling 



<400> 68 

ggactycatc ctggetgtg 

<210> 69 
<211> 19 
<212> DNA 

<213> Artificial Sequence 

<220> 

<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for alpha interferon 
shuffling 

<400> 69 

cacagccagg atgragtcc 

<210> 70 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for alpha interferon 
shuffling 

<400> 70 

aagaatcact ctttatct 

<210> 71 
<211> 18 
<212> DNA 

<213> Artificial Sequence 



<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for alpha interferon 
shuffling 

<400> 71 

agataaagag tgattctt 

<210> 72 
<211> 19 
_-<212> DNA 

<213> Artificial Sequence 

<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for alpha interferon 
shuffling 

<400> 72 

tgggaggttg tcagagcag 

<210> 73 
<211> 19 
<212> DNA 

<213> Artificial Sequence 



<220> 
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<223> Descripf- . of Artificial Sequence: degenerate 
oligonucleotide used for alpha interferon 
shuffling 

<400> 73- 

ctgctccgac aacctccca 

<210> 74 
<2il> 18 
<212> ONA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: degenerate 
oligonucleotide used for alpha incerferon 
shuffling 

<400> 74 

tcawtccttm ctcyttaa 

<210> 75 
<211> 166 
<212> PRT 

<213> consensus alpha interferon 

Cys°Asp 5 Leu Pro Gin Thr His Ser Leu Gly Asn Arg Arg Ala Leu He 

Leu Leu Ala Gin Met Gly Arg He Ser Pro Phe Ser Cys Leu Lys Asp 
20 25 

Arg His Asp Phe Gly Phe Pro Gin Glu Glu Phe Asp Gly Asn Gin Phe 
35 40 " 

Gin Lys Ala 
50 



Gin Ala lie Ser Val Leu His Glu Met lie Gin Gin Thr 
55 60 

Phe Asn Leu Phe Ser Thr Lys Asp Ser Ser Ala Ala Trp Glu Gin Ser 

65 70 75 

Leu Leu Glu Lys Phe Ser Thr Glu Leu Tyr Gin Gin Leu Asn Asp Leu 
85 90 

Glu Ala Cy, Val He Gin Glu Val Gly Val Glu Glu Thr Pro Leu Met 
100 

Asn Glu Asp Ser He Leu Ala Val Arg Lys Tyr Phe Gin Arg He Thr 

115 I 20 l " 

Leu Tyr Leu Thr Glu Lys Lys Tyr Ser Pro Cys Ala Trp Glu Val Val 

130 135 "° 

A,, Ala Glu He Met Arg Ser Leu Ser Phe Ser Thr Asn Leu Gin Lys 



145 



150 



Arg Leu Arg Arg Lys Asp 
165 



<210> 76 



<2ll> 166 
<212> PRT 

<213> human alpha interferon 



£°Lp\e« Pro Gin Thr Hi, Ser Leu Gly Asn Arc Ar, «U Leu He 



1 



Leu Leu Ala Gin Met Gly Arg lie Ser Pro Phe Ser Cy5 Leu Lys Asp 

20 25 
Arg His Asp Phe Gly Lea Pro Gin Ciu Glu ?he Asp Gly Asn Gin Phe 



35 



Gin Lys Thr Gin Ala He Pro Val Leu His Glu Met lie Gin Gin -Thr 

50 " 55 » 

Phe Asn Leu Phe Ser Thr Glu Asp Ser Ser Ala Ala Trp Glu Gin Ser 



65 70 
Leu 



Leu Glu Lys Phe Ser Thr Glu Leu Tyr Gin Gin Leu Asn Asn -.Leu 

85 90 

Glu Ala Cys Val He Gin Glu Val Gly Met Glu Glu Thr Pro Leu Met 
105 4iu 



100 



Asn Glu Asp Ser lie Leu Ala Val Arg Lys Tyr Phe Gin Ar, lie Thr 

Leu Tyr Leu Thr Glu Lys Lys Tyr Ser Pro Cys Ala Trp Glu Val Val 

130 135 
Arg Ala Glu He Met Arg Ser Leu Ser Phe Ser Thr Asn Leu Gin Lys 



145 

Arg Leu Arg Arg Lys Asp 
165 



<210> 77 
<211> 166 
<212> PRT 

<213> human alpha interferon 



■STU'u. "° ™ » ls s « L » u °ix * sn *" *" AU l » Ile 

Leu LflU U. G!n Het Or, U. S.r Pro f M S.r Cy, l.u Lys »P 
20 25 

Arg Pro Asp Phe Gly Leu Pro Gin Glu Glu Phe Asp Gly Asn Gin Phe 

35 40 
Gin Lys Thr Gin Ala He Ser Val Leu His Glu Met He Gin Gin Thr 
50 55 



Phe 
65 



Asn Leu Phe Ser Thr Glu Asp Ser Ser Ala Ala Trp Glu Gin Ser 
70 75 



Leu Leu Glu Ly, * Ser Thr Glu Leu Tyr Gin Gin Leu < Asn Leu. 
95 

Glu Ala Cys Val II. Gin Glu Val Gly Mec Glu Glu Thr Pro Leu Met 



Asn Glu Asp Ser lie L.u Ala Val Ar, Lys Tyr Ph« Gin Arg lie Thr 

H5 120 l " 

Leu Tyr Leu Thr Glu Lys Lys Tyr Ser Pro Cys Ala Trp Glu Val Val 
130 135 *-»0 

Arg Ala Glu He Met Arg Ser Leu Ser Phe Ser Thr Asn Leu Gin Lys 
l4 | ISO l« 

lie Leu Arg Arg Lys Asp 
165 



<210> 78 
<211> 166 
<212> PRT 

<213> human alpha interferon 

£s°Asn 8 Leu Ser Gin Thr His Ser Leu Asn Asn Arg Arg Thr Leu Met 
! 5 10 

Leu Leu Ala Gin Met Arg Arg lie Ser Pro Phe Ser Cys Leu Lys Asp 

20 25 
Arg His Asp Phe Glu Phe Pro Gin Glu Glu Phe Asp Gly Asn Gin Phe 

35 40 
Gla Lys Ala Gin Ala He Ser Val Leu His Glu Met Met Gin Gin Thr 

50 55 
Phe Asn Leu Phe Ser Thr Lys Asn Ser Ser Ala Ala Trp Aso Glu Thr 
65 

Leu Leu Glu Lys Phe Tyr He Glu Leu Phe Gin Gin Met Asn As^ Leu 
8S 90 

Glu Ala Cys Val He Gin Glu Val Gly Val Glu Glu Thr F» Leu Met 
100 105 



Asn Glu Asp Ser lie Leu Ala Val Lys Lys Tyr Phe Gin Arg He Thr 

115 120 
Leu tyr Leu Met Glu Lys Lys Tyr Ser Pro Cy, Ala Trp Glu Val Val 



130 



135 



Arg Ala Glu He Met Arg Ser Leu Ser Phe Ser Thr Asn Leu Gin Lys 
145 150 " 5 

Arg Leu Arg Arg Lys Asp 
165 



<210> 79 
<21l> 166 



<212> PRT 

<213> human alpha interferon 
<400> 79 



Cys Asp Leu Pro Gin Thr His Ser Leu Gly Asn Arg Arg Ala Leu U. 

1 5 10 

Leu Leu Ala Gin Met Gly Arg He Ser His Phe Ser Cys Leu Lys Asp 



20 



25 3° 



Arg His Asp Phe Gly Phe Pro Glu Glu Giu Phe Asp Gly His Gin Phe 
35 40 45 

Gin Lys Thr Gin Ala lie Ser Val Leu His Glu Met II. Gin Gin Thr 
50 55 60 

Phe Asn Leu Phe Ser Thr Glu Asp Ser Ser Ala Ala Trp Glu Gin Ser 
65 ™ 75 

Leu Leu Glu Lys Phe Ser Thr Glu Leu Tyr Gin Gin Leu Asn Asp Leu 
85 90 

Glu Ala Cys Val He Gin Glu Val Gly Val Glu Glu Thr Pro Leu Met 
100 105 110 

Asn Val Asp Ser He Leu Ala Val Arg Lys Tyr Phe Gin Arg He Thr 
US 120 125 

L eu Tyr Leu Thr Glu Lys Lys Tyr Ser Pro Cys Ala Trp Glu Val Val 

130 135 I" 

Arg Ala Glu lie Met Arg Ser Leu Ser Phe Ser Thr Asn Leu Gin Lys 
145 I 50 155 



Arg Leu Arg Arg Lys Asp 
165 



<210> 80 
<211> 166 
<212> PRT 

<213> human alpha interferon 

XysVAeu Pro Gin Thr His Ser Leu Gly His Arg Arg Thr Met Met 

Leu Leu Ala Gin Met Arg Arg He Ser Leu Phe Ser Cys Leu Lys Asp 

20 25 
Arg His Asp Phe Arg Phe Pro Gin Glu Glu Phe Asp Gly Asn Gin Phe 

35 40 
Gin Lys Ala Glu Ala He Ser Val Leu Hi, Glu Val He Gin Gin Thr 
50 55 
• Phe Asn Leu Phe Ser Thr Lys Asp Ser Ser Val Ala Trp Asp Glu Arg 
65 ™ 75 

Leu Leu Asp Lys Leu Tyr Thr Glu Leu Tyr Gin Gin Leu Asn Asp Leu 
85 90 



Gl« AU Cys V.l Met Gin Glu V.l Trp Val Cly Gly Thr Pro Leu Met 

100 105 • 

Asn Glu Asp Ser XI. Leu Ala Val Ar, Lys Tyr Phe Gin Ar, He thr 

115 120 
Leu Tyr Leu Thr Glu Lys Lys Tyr Ser Pro Cys Ala Trp Glu Val Val 

Arg Ala Glu lie Met Arg Ser Phe Ser Ser Ser Ar, Asn Leu Gin Glu 



145 



Arg Leu Arg Arg Lys Glu 
165 



<210> 81 
<211> 166 
<212> PRT 

<213> human alpha interferon 



C^As^Leu Pro Gin Thr His Ser Leu Arg Asn Arg Arg Ala Leu He 
! 5 

Leu Leu Ala Gin Met Gly Arg lie Ser Pro Phe Ser Cys Leu Lys Asp 

20 25 
Arc, His Glu Phe Arg Phe Pro Glu Glu Glu Phe Asp Gly His Gin Phe 

35 40 
Gin Lys Thr Gin Ala lie Ser Val Leu Hi, Glu Met He Gin Gin Thr 

SO 55 
Phe Asn Leu Phe Ser Thr Glu Asp Ser Ser Ala Ala Trp Glu Gin Ser 

Leu Leu Glu Lys Phe Ser Thr Glu Leu Tyr Gin Gin Leu Asn Asp Leu 
85 90 

Gla *U «. V.1 II. «. 01. v.l ay V.l Clu Glu T„r Pro L.» «« 
100 105 

Glu »p «. IU U. «• «l «. ** '*» £ ,U 
115 120 

Leu Tyr Leu Met Glu Lys Lys Tyr Ser Pro Cys AU Trp Glu Val Val 

130 135 

Arg Ala Glu lie Met Arg Ser Phe Ser Phe Ser Thr Asn Leu Lys Lg 



145 



ISO 



Gly Leu Arg Arg Lys Asp 
165 



<210> 82 
<21l> 166 
<212> PRT 

<213> human alpha interferon 



Pro Gin The His Ser Leu Gly Asn Ar, Ar, Ala Leu He 

I S 10 

Leu Leu Ala Gin Met Ar, Ar, He Ser Pro Phe Ser Cy, Leu Lys Asp 

20 " 

Ar, His Asp Phe Glu Phe Pro Gin Glu Glu Phe Asp Asp Lys Gin Phe 

35 40 " 

Gin Lys Ala Gin Ala He Ser Val Leu Hi, Glu Met He Gin Gin Thr 



50 



55 



Phe Asa Leu Phe Ser 7hr Lys Asp Ser Ser Ala Ala Leu Asp Glu Jhr 
65 7 ° 75 

Leu Leu Asp Glu Phe Tyr lie Glu Leu Asp Gin Gin Leu Asn Asp Leu 
85 90 " 

Glu Ser Cys Val Met Gin Glu Val Gly Val He Glu Ser Pro Leu. Met 
100 105 110 

Tyr Glu Asp Ser lie Leu Ala Val Ar, Lys Tyr Phe Gin Ar, lie Thr 
y u5 120 I 25 



Leu Tyr Leu Thr Glu Lys Lys Tyr Ser Ser Cys Ala Trp Glu Val Val 

130 135 140 

Ar, Ala Glu He Met Ar, Ser Phe Ser Leu Ser He Asn Leu Gin Lys 



145 ISO 

Arg Leu Lys Ser Lys Glu 
165 



<210> 83 
<211> 1€6 
<212> PRT 

<213> human alpha interferon 



Cy 0 s°As p 3 Leu Pro Glu Thr His Ser Leu Asp Asn Ar, Ar, Thr Leu Met 

1 5 10 

-Leu Leu Ala Gin Met Ser Ar, He Ser Pro Ser Ser Cys Leu Met Asp 



20 



25 



Ar, His Asp Phe Gly Phe Pro Gin Glu Glu Phe Asp Gly Asn Gin Phe 



35 40 

Pro Ala He i 
50 55 



Gin Ly, Ala Pro Ala He Ser Val Leu His Glu Leu He Gin Gin He 



Phe Asn Leu Phe Thr Thr Lys Asp Ser Ser Ala Ala Trp Asp Glu Asp 

65 7 ° " 

Leu Leu Asp Lys Phe Cy, Thr Glu Leu Tyr Gin Gin Leu Asn Asp Leu 
85 90 



Glu AU Cys Val -t Gin Glu Gl« Arg Val Gly Glu Thr I . Leu Met, 
100 105 

Asn Ala Asp Sec He Leu Ala Val Lys Lys Tyr Phe Arg Arg He Thr 
H5 120 *«' 

Leu Tyc Leu Thr Glu Lys Lys Tyr Ser Pro Cys Ala Trp Glu Val Val 

130 135 1,0 

Arg Ala Glu lie Met Arg Ser Leu Ser Leu Ser Thr Asn Leu Gin Glu 
145 150 - 35 

Arg Leu Arg Arg Lys Glu 
165 



<210> 84 
<211> 16S 
<212> PRT 

<213> human alpha interferon 

cJs°Lp\eu pro Gin Thr His Ser Leu Gly Asn Arg Arg Ala Leu He 
I 5 10 13 

Leu Leu Ala Gin Met Gly Arg He Ser Pro Phe Ser Cys Leu Lys Asp 
20 25 

Arg His Asp Phe Gly Phe Pro Gin Glu Glu Phe Asp Gly Asn Gin Phe 
35 40 " 

Gin Lys Ala Gin Ala He Ser Val Leu His Glu Met He Gin Gin Thr 

50 , 55 
Phe Asn Leu Phe Ser Thr Lys Asp Ser Ser Ala He Trp Glu Gin Ser 

65 70 5 

Leu Leu Glu Lys Phe Ser Thr Glu Leu Asn Gin Gin Leu Asn Asp Met 
85 90 

Glu Ala Cys Val He Gin Glu Val Gly Val Glu Glu Thr Pre Leu Met 
100 105 



Asn Val Asp Ser He Leu Ala Val Lys Lys Tyr Phe Gin Arg He Thr 

115 "0 X " 

Leu Tyr Leu Thr Glu Lys Lys Tyr Ser Pro Cys Ala Trp Glu Val Val 

Arg Ala Glu He Met Arg Ser Phe Ser Leu Ser Lys He Phe Gin Glu 
145 150 . I 55 

Arg Leu Arg Arg Lys Ser 
165 



<210> 85 
<211> 166 
<212> PRT 

<213> human alpha interferon 



Cys°Aap 5 Leu Pro Gin Thr His Ser Leu Gly Asn Arg Arg Ala Leu lie 

Leu Leu Ala Gin Met Gly Arg He Ser Pro Phe Ser Cys Leu Lys Asp 
20 25 30 

Ara Pro Asp Phe Gly Leu Pro Gin Glu Glu Phe Asp Gly Asn Gin Phe 
35 40 45 

G'n Lys Thr Gin Ala lie Ser Val Leu His Glu Me: He Gin Gin Thr 
50 55 60 

Phe Asn Leu Phe Ser Thr .Glu Asp Ser Ser Ala Ala Trp Glu Gin Ser 
65 ">0 75 -~ 

Leu Leu Glu Lys Phe Ser Thr Glu Leu Tyr Gin Gin Leu Asn Asn Leu 
85 90 95 

Glu Ala Cys Val He Gin Glu Val Gly Mec Glu Glu Thr Pro Leu Met 
100 105 iiO 

Asn Glu Asp Ser He Leu Ala Val Arg Lys Tyr Phe Gin Arg lie Thr 
115 120 125 

Leu Tyr Leu Thr Glu Lys Lys Tyr Ser Pro Cys Ala Trp Glu Val Val 
130 135 K0 

Ara Ala Glu He Met Arg Ser Leu Ser Phe Ser Thr Asn Leu Gin Lys 
145 . 150 155 160 

He Leu Arg Arg Lys Asp 
165 



<210> 86 
<211> 166 
<212> PRT 

<213> human alpha Interferon 

Cvs°Asp 6 Leu Pro Gin Thr His Ser Leu Gly Asn Arg Arg Ala Leu lie 
I 5 10 15 



Leu 



Leu Ala Gin Met Gly Arg He Ser His Phe Ser Cys Leu Lys Asp 



20 25 



Arg Tyr Asp Phe Gly Phe Pro Gin Glu Val Phe Asp Gly Asn Gin Phe 
35 40 * 5 

Gin Lys Ala Gin Ala He Ser Ala Phe His Glu Met He Gin Gin Thr 
50 55 60 

Phe Asn Leu Phe Ser Thr Lys Asp Ser Ser Ala Ala Trp Asp Glu Thr 
65 ™ 7 5 

Leu Leu Asp Lys Phe Tyr He Glu Leu Phe Gin Gin Leu Asn Asp Leu 
85 90 

Glu Ala Cys Val Thr Gin Glu Val Gly Val Glu Glu He JlJ tma Met 
100 W5 



Asn Glu Asp Ser II. Leu Ala Val Arg Lys Tyr Phe Gin Arg He Thr 



IIS 



Leu Tyr Leu Met Gly Lys Lys Tyr Ser Pro Cys Ala Trp Glu Val Val 

130 135 1 

Arg Ala Glu lie Met Arg Ser Phe Ser Phe Ser Thr Asn Leu Gin Lys 
145 150 135 



Gly Leu Arg Arg Lys Asp 
165 



<210> 87 
<2H> 501 
<212> DNA 

<213> consensus alpha interferon 



^gat«gc ctcagaccca cagcctgggt aataggaggg «»gatact "tggcacaa 60 
a?gggaagaa tctctccttt ctcctgcctg aaggacagac Jtgactttgg a«tccccag 10 

atacaggagg ttggggtgga agagactccc ctgatgaatg cccttgtgcc 420 

aggaaatact tccaaagaat cactctttat ctgacagaga agaaatacag cc » 9 

tgggaggttg tcagagcaga aatcatgaga tccttctctt tttcaacaaa cttgc ^ 
agattaagga ggaaggattg a 

<210> 88 
<211> 501 
<212> DNA 

<213> human alpha interferon 

^gatc'tgc ctcagaccca cagcctgggt aataggaggg ecjtjjtjct cctg g cacaa ^ 

a?gggaagaa tctctccttt ctcctgcptg ^agac ^ lM 

gaggagtttg atggcaacca gttccagaag actcaagcca « » ggaacagagc 240 

Kccagcagl ccttcaatct cttcagcaca gaggactcat =jgctgcttg 2QQ 

ctcctagaaa aattttccac tgaactttac "gcaactga ataacctgga ag g g * ^ q 

Ttagaggagg ttgggatgga agajactccc aggactc^t "tgg^g ^ 

aggaaatact tccaaagaat cactctttat =taacagaga ag * cttgcaaaaa 480 

tgggaggttg tcagagcaga aatcatgaga tccctctctt tttcaaca * ^ 
agattaagga ggaaggattg a 



<210> 89 
<211> 501 
<212> DNA 

<213> human alpha interferon 



<400> 89 „ M1 , B a<no ecttgatact cctggcacaa 60 

tgtgatctgc ctcagaccca cagcctgggt "taggaggg acttccccag 120 

atgggaagaa tctctccttt ctcctgcctg aaggacagac ctga w ccatga g atq i 8 0 
gaggagtttg atggcaacca gttccagaag actcaagcca ctqctgcttg ggaacagagc 240 
atccagcaga ccttcaatct cttcagcaca gaggactcat !S«Uw agcatgtgtg 300 
ctcctagaaa aattttccac tgaactttac "gcaactga ataacctgg^ ^ ^ 3^ 
atacaggagg ttgggatgga agagactccc "gatgaatg JJJ cccttgtgcc 420 

aggaaatact tccaaagaat cactctttat ="acagaga !J!! aacaaa cttgcaaaaa 480 
tgggaggttg tcagagcaga aatcatgaga tctctctctt tttcaaca * Ml 



atattaagga ggaaggattg a 



<210> 90 
<21l> SOI 
<212> ONA 

<213> human alpha interferon 
<400> 90 



egtMcetgt ctcaaaccca cagcctgaat aacaggagga ccctgatgct "tggcaca. 60 

atgaggagaa tctccccttc ctcccgcctg aaggacagac atgactttga "tcccccag l*u 

gag^atJtg atggcaacca gttccagaaa gcccaagcca tctctgtcct ce gagatg 80 

acgcagcaga ccttcaatct cttcagcaca aagaactca- ctgctgcccg ggaw* h 

cccccagaaa aattctacat tgaacttttc cagcaaatga atgacctgga J00 

acacaggagg ttggggtgga agagactccc «ga:gaatg SSSw S 

aagaaatact cccaaagaat cactctttat ctgacggaga agaaaca^a, « c "5£* 

"ggaggctg tcagagcaga aatcatgaga ceecsctctt tttcaacaaa ctcgcaaaaa 430 

agattaagga ggaaggattg a 

<210> 91 
<21l> 501 
<212> DNA 

<213> human alpha interferon 

tgJgatctgc ctcagaccca cagcctgggt aataggaggg ccttgatact "tggcacaa 60 
awggaagaa tctctccttt ctcatgcctg aaggacagac atgatttcgg attccccgag 120 
aaaoaatttg atggccacca gttccagaag actcaagcca tctctgtcct ccatgagatg 180 
aScalcagl cc«caatct cttcagcaca gaggacccat ctgctgcttg gg«cagagc "0 
ctc"ag«a aattttccac tgaactttac cagcaactga atgacctgga agcatgtgtg 300 
a acaggagg ttggggtgga agagactccc ctgatgaatg tggactccat cctgg'tgtg 360 
aggaaataet tccaaagaat cactctetat ctaacagaga agaaatacag "cttgcgcc 
Jgggaggttg tcagagcaga aatcatgaga tccctctcgt tttcaacaaa cttgcaaaaa 480 
agattaagga ggaaggattg a 

<210> 92 
<211> 501 
<212> DNA 

<213> human alpha interferon 

^tgatctgc ctcagaccca cagcctgggt cacaggagga ccatgatgct «^cacaa 60 

tgggaggttg tcagagcaga aatcatgaga tccttctctt catcaagaaa cttgcaag ^ 
aggttaagga ggaaggaata a 



<210> 93 
<211> 501 
<212> DNA 

<213> human alpha interferon 



U°£t£«jc ctcagaccca cagcctgcgt aataggaggg ccttgatact cctggcac JO 
atgggaagaa tctctccttt ctcctgcttg aaggacagac atgaattcag attccc g g 
gaggagtug atggccacca gttccagaag actcaagcca tctctgtcct ccatgag g 
atccagcaga ccttcaatct cttcagcaca gaggactcat ctgctgcttg ggaac , , 
ctcctagaaa aattttccac tgaactttac cagcaactga atgacctgga 
atacaggagg ttggggtgga agagactccc ctgatgaatg aggactccat cccgg 



AO/ A/ 



aggaaacact ted agaat cactctttat ctaatggaga agaaataci. «c«g.tgcc 420 
tgggaggttg tcagagcaga aatcacgaga tcctcctcct tctcaacaaa ctcgaaaaaa 480 
ggattaagga ggaaggattg a 



<210> 94 
<211> SOI 
<212> DMA 

<213> human alpha interferon 



60 

120 

130 



<400> 94 . „„,..„.,.,_,, 

tgcgatctgc ctcagactca cagcctgggt aacaggaggg ccccgacac. «-ggca.aa 
acgcgaagaa ccccccctec ctcctgcctg aaggacagac atgaccttga attcccccag 
gaggagtttg acgataaaca gttccagaag gcccaagcca tctccgtccc cca.gagacg 
itccagcaga cccccaaccc cttcagcaca aaggacccat ccgccgcttt 99«gagacc 240 
cctctagatg aaccctacat cgaacccgac cagcagctga acgacctgga g.cecgcgcg 300 
acgcaggaag tgggggtgac agagcccccc ctgacgaatg aggaccccac cctggccgcg 360 
agg«a?a« ccIIiagLc cactctatat ctgacagaga agaaatacag ctcttgtgcc 420 
cgggaggttg tcagagcaga aatcacgaga tccttctcct tatcaatcia cccgcaaaaa 480 



agactgaaga gcaaggaatg a 

<210> 95 
<211> 501 
<212> ONA 

<213> human alpha interferon 



501 



tatSatctcc ctgagaccca cagcctggat aacaggagga ccttgatgct cctggcacaa 60 
a?gagca,aa tcL?ccttc ctcctgtctg atggacagac atgactttgg atttccccag 20 
gaggag«tg atggcaacca gttccagaag gctccagcca tctctgtcct «atgagctg 180 
atccagcaga tcttcaacct cttctccaca aaagattcat ctgctgcttg Wjgaggac 240 
ctcctagaca aattctgcac cgaactctac cagcagctga atgacttgga agcctgtgt, 300 
acgcaggagg agagggtggg agaaactccc ctgatgtacg cggactccat "tggccgtg 360 
aagaaSaS tccaaagH? cactctctat ctgacagaga agaaatacag cccttgtgcc 20 
tgggaggttg tcagagcaga aatcacgaga tccctctctt tatcaacaaa cttgcaagaa 480 
agattaagga ggaaggaata a 

<210> 96 
<211> 501 
<212> ONA 

<213> human alpha interferon 

tTtgatctgc ctcagaccca cagcctgggt aataggagg, ccttgatact -^gcacaa 60 
atgggaagaa tctctccttc ctcctgcctg aaggacagac atgacttcgg "tcccccaa i« 
gaggagtJtg atggcaacca gttccagaag gctcaagcca tctctgtcct "atgagatg 180 
'atccagcaga ccttcaatct cttcagcaca aaggactcat ctgctacttg gq»»«W= 
c""agaaa aattttccac tgaacttaac cagcagctga atgacatgga agcctgcgtg 300 
alacaggagg ttggggtgga agagactccc ctgatgaatg Wettttt cc^ctgtg 0 
aagaaatact tccaaagaat cactctttat ctgacagaga agaaatacag cecttgcgec 
tgggaggttg tcagagcaga aatcacgaga tccttccctc catcaaaaat ttttcaagaa ^ 
agattaagga ggaaggaacg a 



<210> 97 
<211> 501 
<212> ONA 

<213> human alpha interferon 



<400> 97 *-i.».«atact cctggcacaa 60 

tgtgatctgc ctcagaccca cagcctgggt aataggaggg ««gatact £ lM 
atgggaagaa tctctccttt ctcctgcctg aaggacagac "gactttgg l80 
gaggagtttg atggcaacca gttccagaag actcaagcca tctctgtcct cc 



atccagcaga ccttcaatcc ctccagcaca 
ctcctagaaa aattttccac tgaactctac 
atacaggagg. ttgggatgga agagactccc 
aggaaacacc cccaaagaac cactctttat 
Cgggaggttg tcagagcaga aatcacgaga 
agatcaagga ggaaggatcg a 



gaggactcac ctgctgcceg ggaacagagc 240 
cagcaaccga ataacccgga agcatgcgtg 300 
ctgatgaazg aggactccac ctcggccgcg 360 
ccaaeagaga agaaatacag ccctcgcgcc 420 
tcteceeeec ctccaacaaa cttgcaaaaa 480 



<210> 90 
<2ll> SOI 
<212> ONA 

<213> huaaa alpha interferon 



<400> 98 

tgtgatccgc ctcagaccca cagcccgggt 
acgggaagaa tctctcattt ccectgcctg 
gaggcgcctg acggcaacca gctccagaag 
atccagcaga ccttcaatct cctcagcaca 
cccccagaca aattctacat tgaacttttc 
acacaggagg ccggggcgga agagaccgcc 
aggaaatact ttcaaagaac cactctttat 
tgggaggttg tcagagcaga aatcatgaga 
ggateaagaa ggaaggatcg a 



aacaggaggg rcctgacac: cctggsacaa 6C 
aaggacagac atgattccgg actcccccag 120 
gcccaagcca tctccgccct ecatgagatg 13C 
aaggattca: ctgctgcceg ggatgagacc 240 
cagcaaccga acgacccaga agcctgtgtg 300 
ctgatgaatg aggaccccac cctggccgcg 360 
ctgatggaga agaaacacag cccttgtgcc 420 
tcettctctt tttcaacaaa cttgcaaaaa 480 

501 
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Interferon Figures 

Protein sequences of interferon alphas to be shuffled 

i.coasenaus colpqthslohrralillaqmgrispfscl 
*** * * 

10 20 30 



a. alpha I 

3. alpha c ........ _ --------- 

1.3&1 :?:!:::::!:::! « : 

?:3S! : yn K : : : : : R : : : 2 

a. alpha e 



9. alpha O ----K--IlD-rriI«II"Il|""""- 
10-alpha F - - - .1 ...... 3 S 



11. alpha I 

12. alpha WA 



l. Consensus 



JCDRHDFGPPQEEFDOHQFQKAQA1SVI.HBM 

2. alpha I ? '//.... </ T ] , " # 

3. alpha C - - - p - - . L * p 

4. alpha H B • 

5. alpha 4B B H.r"*T 

6. alpha 6 r ..... _ T 1 

7. alpha 7 E-R--E---.~-h~.~~~t V 

8. alpha 8 k...... nv~~'~ ----- 

9. alpha O H K I 

10. alpha F ------P-----..L 

11. alpha I ---p.-.L--.-- 

12. alpha MA 

1 . Consensus 



A F - 



XQQTFHIiFSTKDSS A'A WEQSLLEKFSTELY 
* ••• ••••• . 

, . . 70 80 90 

2. alpha t ......... .B-- ....... 

3 . alpha C --------- -b... 

4. alpha H M --H---1~-DET~-Z~~~Zt~~1 

5. alpha 4B B-- YI--P 

'.Shi, :::::::::: E :- v 

10. alpha F -DED--D--C---- 

11. alpha I b-- H 

12. alpha WA --DET--D--YZ--F 

1. Consensus QQI.MDLEACVIQBVOVEETPLMKBDSII,AV 

ioo no lao 



2. alpha Z - - - - h 

3. alpha C - - - - h 

4. alpha H - - m - - 

5. alpha 4B 

«. alpha 6 - • - - H • . .» ! bp 

7. aloha 7 _ . _ _ . 0 ° 



alpha 7 

8. alpha 8 

9. alpha o 

10. alpha F 

11. alpha Z _.. M 

12. alpha MA ----.--.-.I...... j A 



- s R - g a 
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1 . Consensu* 



BKYFORXTLYX.TBXKYSPCAWBVVRABIKR 



2. alpha I "° X i° 150 

3. alpha C ........ ..ZllZl""""'" 

4. alpha B k---.-___._w_I " 

5. alpha 4B - " * ~ " 
S. alpha 6 ----____..__ ------- 

7. alpha 7 - ..... M ._ ------- 

8- alpha 8 I ' ~ I ' ' ~ 

S. alpha O K---R. 

10. alpha F ......... 

11. alpha I .... ....... ------ 

12. alpha HA MO - .111311 

1. Consensus SLSFSTEfLQKRLBR K D 

* * »••♦•• » » 

, _ 160 

2. alpha I -----.._.___.„_ 

3. alpha C .J...." 

4. alpha H _ 

5. alpha 4B 

6. alpha 6 -F-s-R-.-B e 

7. alpha 7 - F R - 6 - • - - - 

. - S ~ F - t - X KS-B 

9. alpha 0 ---L-.-.-E---- B 

10. alpha F -F-L-KIF-E E 

11. alpha I __.t. 

12. alpha HA - p B-..I" 
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ma sequences of interferon alphas to be shuffled 

1. Consensus TQTGATCTGCCTCAGACCCACAGCCTG 6 0 T 

2. alpha z _° ?°. 30 

3. alpha C ------------- 

4. alpha H ---A-----T----A-- ----- 

5 . alpha 4B **• 

6 . alpha 6 . - - ... - - - 

7. alpha 7 

8. alpha 8 .I__x*"~*"""~" c "~ 

9. alpha O c---G------ ----- 

10. alpha P ---------x- 

11. alpha I -__"*"* " 

12. alpha HA - .__ T _ " m ' 

1. Consensus aataggagggccttgatactcctggcacaa 

2. alpha I 

3. alpha C ----------- 

4. alpha R --c ti • T I ' 

5. alpha 4B - - 

«. alpha 6 C-C------A--A--- o"~ "*"""""*" 

7. alpha 7 - - 

8. alpha 8 --c-------.-___ .1 ['''''''' 

9. alpha 0 --C------A----- 

10. alpha F 

11. alpha I ---------.-_-.._ ------------ 

12. alpha HA ---------------- - -.___I~I~~""" 

1. Conaensus ATGGGAAGAATCTCTCCTTTCTCCTGCCTG 

, 70 80" 90 

2. alpha I 

3. alpha C ------------ 

4. alpha K ---A-G ----- " 

5. alpha 4B -------------- __""""""*"" 

6. alpha 6 ---A-G----------T- 

7. alpha 7 -------------- -_...""~"""" T 2"" 

8. alpha 8 ---C------------ ._.__"""*" "* 

9. alpha O ---A-C-----------I*r"" ~"*2""" 

10. alpha F " 

11. alpha I _ _ 

12. alpha MA -------_-..____ . ^ I ' 

1. Conaensus AAQGACAGACATOACTTTGGATTTCCCCAG 

2. alpha t 1 -° 1 _° 120 r 

3. alpha C ----------c------ --c-------- 

4. alpha K ---------------..."»" 

5. alpha 4B --- — ... . . . . T -a-------... 

«. alpha 6 - ci!!" 

7. alpha 7 *.-c*~~~Z~~~ 

s. alpha 8 ^::f,: ^"" A °-- 

9. alpha o -T- ---------- . --c------ 

10. alpha V ----- ------------- 

11. alpha I ----------c-------""*"." 

12. alpha HA --------- r----T-.r''^~''''' 
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1. Consensus GAQ6AOTTTGAT06CAAC CAGTTCCAGAAG 

130 140 ISO 

2. alpha I -------- - 

3 . alpha C ----------------------- _ _ 

4 . alpha H -----A-----------------------A 

■5. alpha 48 . . . c - - - - - - - - 

6. alpha 6 ---------------------------_._ 

7. alpha 7 ------ --C 

8 . alpha 8 ------AT--A-- ----- 

9. alpha D ------------------------------ 

10. alpha P -------- ............ 

11. alpha I - - 

12. alpha MA ----T--- 

1. consensus gctcaagccatctctgtcctccatgagatg 

ICO 170 180 

2. alpha I A - - - - C ----- - 

3. alpha C a----------------- 

4. alpha B ---------------- 

5. alpha 4B a--------------- -- ------------ 

6. alpha 6 ----------- ---_.-..--g.. 

7. alpha 7 A - - - 

8. alpha 8 -...------------........._..__ 

9. alpha O ----C----------------------C-- 

10. alpha F - 

11. alpha I a ------ - - - 

12. alpha HA ------ -c-T -------- 

1. Consensus ATCCAGCAGACCTTCAATCTCTTCAGCACA' 

190 200 210 

2. alpha I - - - - - - - - - 

3. alpha c - - ----- 

4. alpha H --G--- ------------- - - 

5. alpha 4B ------------------------------ 

6. alpha 6 --T---------- ----------------- 

7. alpha 7 ----- - - - - - 

a. alpha 8 ---------------- -c------------ 

9. alpha D -------- - - C- -- -- -TC---- 

10. alpha F --- - — - - 

11. alpha. I ------------------- ------ 

12. alpha HA ------------------- ------ 

1. Consensus AAGGACTCATCTGCTG CTTGGGATGAGAGC 

220 230 240 

2. alpha 1 G - - - -----ac----- 

3. alpha C g------------ --------- - A' C--- " 

4. alpha H A- c - 

5 . alpha 4B G - - - ----------AC- 

6. alpha 6 -_.-----_--.-x-- -------------a 

7. alpha 7 G AC 

8. alpha 8 -------- ........ ...T-----.--C- 

9. alpha D --A--T---------- -----------GA- 

10. alpha F - - - - - A A C - - - - - 

11. alpha I g ----- - -AC- 

12. alpha HA _-t-------- C - 
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1. consensus CTCCTAGAAAAATTTTCCACTGAACTTTAC 

2S0 260 270 

2. alpha I .... 

3 . alpha C -- --- 

4. alpha B C-A--T-- T - 

5. alpha 4B ------------------ --------- 

6. alpha 6 - - T C---C-C-AT - 

7. alpha 7 - ------ - - - - - 

8 . alpha 8 --T 18 C-H--TC - - G - - 

9. alpha 0 --------c C-0---C-----C--- 

10. alpha F ------------- - - - A - - 

11. alpha I ------ 

12. alpha HA --C C-A--T--- T - 

1. Consensus cagcaactgaatgacctggaagcctotctg 

280 290 300 

2. alpha I A - - - - - A ----- - 

3. alpha C - - A - - - A 

4. alpha H A 



5. alpha 4B 

6. alpha 6 ..... o 



7. alpha 7 

8. alpha 8 G QT 

9. alpha 0 G t ----- - 

10. alpha P 0 A 

11. alpha I - -a--- 

12. alpha wa ------ ------a---- 



1. Consensus ATACAGGAGGTTGGGOTGGAAGAGACTCCC 

310 320 330 

2. alpha I - - - <j - - - - - A 

3. alpha C - - A 

4. alpha H - - - - - - 

5. alpha 4B 

6. alpha 6 --G--------GT------G--G-- -- 

7. alpha 7 ------------------------------ 

8. alpha 8 --G A--G------AT----T 

9. alpha D --G AGA------G---A 



10. alpha F 

11. alpha X 

12. alpha HA 



1. Consensus CTGATGAATGAGGACTCCATCCTGGCTGTG 

340 350 3S0 

2. alpha I 

3. alpha C -- 

4. alpha H ---- - - 

5. alpha 4B - -- - -T 

6 . alpha 6 

7. alpha 7 - - 

8. alpha 8 ---T 

9. alpha o ------T-c-c------- ....... 

10. alpha F - ----T----- t 

11. alpha I - T 

12. alpha HA - - - ......... 
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AGGAAATACTTCCAAAOAATCACTCTTTAT 

2. alpha I "°_ ... 380 390 

3 . »Tpha C 

* ■ Alpha R -h-.. Ill III"''' ' ' ------ 

5. alpha 4B - - — ....... --------- 

6. alpha 6 - - A '-Ill 

7. alpha 7 - - - C - - c 

8. alpha 8 — ....... 

9- alpha o - A ' ' I ' A - - - 

10. alpha F -A - - - c - - - 



11. alpha z 

12. alpha WA 



- T 



1-COnaenaus CTOACAOAGAAGAAATACAGCCCTTOTGCC 

2. alpha I - - A - - . *" 4 " 

3. alpha C --a Ill 

4 • alpha H ----TO 

5. alpha 4B - -A " 

«. alpha 6 a - I a 

7. alpha 7 --A-TO * - ° 

8. alpha 8 - 

9. alpha 0 . ----- T-- 

10. alpha P --------..III ' ' ------- 

11. alpha I - - A ""~-----------t 

12. alpha MA - • - - t G - I I I I I I ------- 

1. Conaensua TGGGAGOTTGTCAOAGCAGAAATCATCAGA • 

2. alpha I 4 ?°. . 440 «0 

3. alpha C ----- - 

4- alpha M - - - - ------- 

5. alpha 4B - -- -- -- -- - 

6 . alpha 6. .... ..... "** 

7. alpha 7 ------ 

8 . alpha 8 ""* ------- 

9. alpha 0 - - - - I ------ 

10. alpha F -------....""*""""■"""*------- 

11. alpha I --------- - -- - 

12. alpha ha ---------...III ll~~~~~~~~~~~~- 

1. Coaaenau, tccttctctttttcaacaaacttgcaaaaa 

2. alpha I c - - - 4 l°. - . . . 47 ° 480 

3. alpha c --TC------. 

4. alpha a ---c lllll 

5. alpha 4B ---C----Q-. 

6. alpha S c ,."."'* I 

7. alpha 7 . 0 G-- 

8. alpha 8 2 * "'* " : : A 

9. alpha O ---c A-..-TC ._ 

10. alpha F 2 I I I ' I ~ ' ' ' O - - 

11. alpha I --TC *--TT--T---Q.- 

12. alpha NA - — -------- --------------- 



Figure 2 Page 6 of 7 
(7/9) 



WO 98/27230 



PCI7US9704239 



1. Combmuj AGATTAAGGAGCAAG G A T T 0 A 

.490 500 

2. alpha I -------__-----_____ 

3. alpha C -T-----------.. --_-__ 

4. alpha H --- _______ .... 

5. alpha 4B -------_-----__._____ 

6. alpha 6 _-<j__. ___.___._ --A-x- 

7. alpha 7 a- _______ ______ 

B. alpha a -----o-a---t-----a--- 

f ; a J p S a 2 A - A - 

10. alpha P --_-.__________ __»___ 

11. alpha I - _ - - - __________ 

12. alpha HA G-------A-- - - -- -- -- -- 
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